COMPUTER VISION CAPSTONE PROJECT AIML OBJECT DETECTION - CAR¶
AWS Specific code
Installing libraries
!pip install tensorflow opencv-python pillow scikit-learn
Requirement already satisfied: tensorflow in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (2.16.2) Requirement already satisfied: opencv-python in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (4.11.0.86) Requirement already satisfied: pillow in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (11.1.0) Requirement already satisfied: scikit-learn in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (1.6.1) Requirement already satisfied: absl-py>=1.0.0 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from tensorflow) (2.1.0) Requirement already satisfied: astunparse>=1.6.0 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from tensorflow) (1.6.3) Requirement already satisfied: flatbuffers>=23.5.26 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from tensorflow) (25.2.10) Requirement already satisfied: gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from tensorflow) (0.6.0) Requirement already satisfied: google-pasta>=0.1.1 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from tensorflow) (0.2.0) Requirement already satisfied: h5py>=3.10.0 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from tensorflow) (3.12.1) Requirement already satisfied: libclang>=13.0.0 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from tensorflow) (18.1.1) Requirement already satisfied: ml-dtypes~=0.3.1 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from tensorflow) (0.3.2) Requirement already satisfied: opt-einsum>=2.3.2 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from tensorflow) (3.4.0) Requirement already satisfied: packaging in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from tensorflow) (21.3) Requirement already satisfied: protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from tensorflow) (4.25.6) Requirement already satisfied: requests<3,>=2.21.0 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from tensorflow) (2.32.3) Requirement already satisfied: setuptools in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from tensorflow) (75.8.0) Requirement already satisfied: six>=1.12.0 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from tensorflow) (1.17.0) Requirement already satisfied: termcolor>=1.1.0 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from tensorflow) (2.5.0) Requirement already satisfied: typing-extensions>=3.6.6 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from tensorflow) (4.12.2) Requirement already satisfied: wrapt>=1.11.0 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from tensorflow) (1.17.2) Requirement already satisfied: grpcio<2.0,>=1.24.3 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from tensorflow) (1.70.0) Requirement already satisfied: tensorboard<2.17,>=2.16 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from tensorflow) (2.16.2) Requirement already satisfied: keras>=3.0.0 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from tensorflow) (3.8.0) Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from tensorflow) (0.37.1) Requirement already satisfied: numpy<2.0.0,>=1.23.5 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from tensorflow) (1.26.4) Requirement already satisfied: scipy>=1.6.0 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from scikit-learn) (1.15.1) Requirement already satisfied: joblib>=1.2.0 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from scikit-learn) (1.4.2) Requirement already satisfied: threadpoolctl>=3.1.0 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from scikit-learn) (3.5.0) Requirement already satisfied: wheel<1.0,>=0.23.0 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from astunparse>=1.6.0->tensorflow) (0.45.1) Requirement already satisfied: rich in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from keras>=3.0.0->tensorflow) (13.9.4) Requirement already satisfied: namex in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from keras>=3.0.0->tensorflow) (0.0.8) Requirement already satisfied: optree in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from keras>=3.0.0->tensorflow) (0.14.0) Requirement already satisfied: charset_normalizer<4,>=2 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from requests<3,>=2.21.0->tensorflow) (3.4.1) Requirement already satisfied: idna<4,>=2.5 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from requests<3,>=2.21.0->tensorflow) (3.10) Requirement already satisfied: urllib3<3,>=1.21.1 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from requests<3,>=2.21.0->tensorflow) (1.26.19) Requirement already satisfied: certifi>=2017.4.17 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from requests<3,>=2.21.0->tensorflow) (2025.1.31) Requirement already satisfied: markdown>=2.6.8 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from tensorboard<2.17,>=2.16->tensorflow) (3.7) Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from tensorboard<2.17,>=2.16->tensorflow) (0.7.2) Requirement already satisfied: werkzeug>=1.0.1 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from tensorboard<2.17,>=2.16->tensorflow) (3.1.3) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from packaging->tensorflow) (3.2.1) Requirement already satisfied: MarkupSafe>=2.1.1 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from werkzeug>=1.0.1->tensorboard<2.17,>=2.16->tensorflow) (3.0.2) Requirement already satisfied: markdown-it-py>=2.2.0 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from rich->keras>=3.0.0->tensorflow) (3.0.0) Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from rich->keras>=3.0.0->tensorflow) (2.19.1) Requirement already satisfied: mdurl~=0.1 in /home/ec2-user/anaconda3/envs/tensorflow2_p310/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich->keras>=3.0.0->tensorflow) (0.1.2)
import os
os.environ["TF_ENABLE_ONEDNN_OPTS"] = "0"
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
!nvidia-smi
Sat Mar 29 10:40:41 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.03 Driver Version: 550.144.03 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A10G On | 00000000:00:1E.0 Off | 0 |
| 0% 33C P8 16W / 300W | 1MiB / 23028MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
import boto3
def download_files_from_bucket(file,bucket):
'''
this function is for downloading the files from the bucket to the local instance
'''
bucket_name = bucket
file_key = file
local_file_path = file
s3 = boto3.client('s3')
s3.download_file(bucket_name, file_key, local_file_path)
print(f"File downloaded to {local_file_path}")
download_files_from_bucket('stanford-car-dataset-by-classes-folder.zip','pgp-capstone-project')
File downloaded to stanford-car-dataset-by-classes-folder.zip
zip_file_path = 'stanford-car-dataset-by-classes-folder.zip'
!unzip -oq stanford-car-dataset-by-classes-folder.zip
- Problem Statement
Computer vision can be used to automate supervision and generate action appropriate action trigger if the event is predicted from the image of interest. For example a car moving on the road can be easily identified by a camera as make of the car, type, colour, number plates etc.
Design a DL based car identification model.
- Introduction
The Cars dataset contains 16,185 images of 196 classes of cars. The data is split into 8,144 training images and 8,041 testing images, where each class has been split roughly in a 50-50 split. Classes are typically at the level of Make, Model, Year, e.g. 2012 Tesla Model S or 2012 BMW M3 coupe.
Data description:
‣ Train Images: Consists of real images of cars as per the make and year of the car.
‣ Test Images: Consists of real images of cars as per the make and year of the car.
‣ Train Annotation: Consists of bounding box region for training images.
‣ Test Annotation: Consists of bounding box region for testing images.
- Libraries Used
import os
import zipfile
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt #for visualization
import matplotlib.patches as patches
import seaborn as sns
from PIL import Image # For image loading and manipulation
from pathlib import Path
import cv2
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import LabelEncoder
from sklearn.utils.class_weight import compute_class_weight
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, GlobalAveragePooling2D, BatchNormalization
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
from sklearn.utils import class_weight
from tensorflow.keras.applications.resnet50 import preprocess_input as resnet_preprocess
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input as mobilenet_preprocess
from keras.applications.inception_v3 import preprocess_input as googlenet_preprocess
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.applications import InceptionV3
from tensorflow.keras.applications import ResNet50
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True) # Allow dynamic allocation
except RuntimeError as e:
print(e)
- Data Handling
4A. Data Handling - Import Data
train_annotations_df = pd.read_csv( "anno_train.csv",header=None)
test_annotations_df = pd.read_csv( "anno_test.csv",header=None)
image_class_df = pd.read_csv( "names.csv",header=None)
train_annotations_df.rename(columns={0:"image_name",1:"xmin",2:"ymin",3:'xmax',4:'ymax',5:'image_class'},inplace=True)
test_annotations_df.rename(columns={0:"image_name",1:"xmin",2:"ymin",3:'xmax',4:'ymax',5:'image_class'},inplace=True)
image_class_df.rename(columns={0:'image_name'},inplace=True)
train_annotations_df.head(5)
| image_name | xmin | ymin | xmax | ymax | image_class | |
|---|---|---|---|---|---|---|
| 0 | 00001.jpg | 39 | 116 | 569 | 375 | 14 |
| 1 | 00002.jpg | 36 | 116 | 868 | 587 | 3 |
| 2 | 00003.jpg | 85 | 109 | 601 | 381 | 91 |
| 3 | 00004.jpg | 621 | 393 | 1484 | 1096 | 134 |
| 4 | 00005.jpg | 14 | 36 | 133 | 99 | 106 |
test_annotations_df.head()
| image_name | xmin | ymin | xmax | ymax | image_class | |
|---|---|---|---|---|---|---|
| 0 | 00001.jpg | 30 | 52 | 246 | 147 | 181 |
| 1 | 00002.jpg | 100 | 19 | 576 | 203 | 103 |
| 2 | 00003.jpg | 51 | 105 | 968 | 659 | 145 |
| 3 | 00004.jpg | 67 | 84 | 581 | 407 | 187 |
| 4 | 00005.jpg | 140 | 151 | 593 | 339 | 185 |
# for images
base_dir = Path(r"./car_data") #replace the directory accordingly
train_images_path = base_dir / "car_data" / "train"
test_images_path = base_dir / "car_data" / "test"
train_images_path = Path(train_images_path).resolve()
test_images_path = Path(test_images_path).resolve()
print(f"train image path is {train_images_path}")
print(f"test image path is {test_images_path}")
train image path is /home/ec2-user/SageMaker/car_data/car_data/train test image path is /home/ec2-user/SageMaker/car_data/car_data/test
4B. Data Handling - Map Images w.r.t Classes
#Train Images class mapping
#Folder where multiple train images are stored
train_class_folders = [f.path for f in os.scandir(train_images_path) if f.is_dir()]
train_image_classes = {} # Dictionary to store training image: class mapping
train_images_path = list(train_images_path.rglob("*.jpg"))
# Create a dictionary mapping image filenames to class names (parent folder)
train_image_classes = {img_path.name: img_path.parent.name for img_path in train_images_path}
# Define columns for the Training DataFrame
columns_training = ['Image_Path', 'labels']
# Create an empty DataFrame
df_training = pd.DataFrame(columns=columns_training)
df_training = pd.DataFrame(train_images_path, columns=["Image_Path"])
df_training["labels"] = df_training["Image_Path"].apply(lambda x: Path(x).parent.name)
df_training["Image_Path"] = df_training["Image_Path"].apply(lambda x: str(Path(x).resolve()))
df_training["Image_Path"] = df_training["Image_Path"].astype(str)
print(df_training.head(10))
# --- Print a few mappings to verify ---
print("Sample Training Image to Class Mappings:")
count = 0
for img_name, class_label in list(train_image_classes.items())[:5]:
print(f"{img_name}: {class_label}")
Image_Path labels 0 /home/ec2-user/SageMaker/car_data/car_data/tra... Infiniti QX56 SUV 2011 1 /home/ec2-user/SageMaker/car_data/car_data/tra... Infiniti QX56 SUV 2011 2 /home/ec2-user/SageMaker/car_data/car_data/tra... Infiniti QX56 SUV 2011 3 /home/ec2-user/SageMaker/car_data/car_data/tra... Infiniti QX56 SUV 2011 4 /home/ec2-user/SageMaker/car_data/car_data/tra... Infiniti QX56 SUV 2011 5 /home/ec2-user/SageMaker/car_data/car_data/tra... Infiniti QX56 SUV 2011 6 /home/ec2-user/SageMaker/car_data/car_data/tra... Infiniti QX56 SUV 2011 7 /home/ec2-user/SageMaker/car_data/car_data/tra... Infiniti QX56 SUV 2011 8 /home/ec2-user/SageMaker/car_data/car_data/tra... Infiniti QX56 SUV 2011 9 /home/ec2-user/SageMaker/car_data/car_data/tra... Infiniti QX56 SUV 2011 Sample Training Image to Class Mappings: 05829.jpg: Infiniti QX56 SUV 2011 04532.jpg: Infiniti QX56 SUV 2011 04524.jpg: Infiniti QX56 SUV 2011 04856.jpg: Infiniti QX56 SUV 2011 02413.jpg: Infiniti QX56 SUV 2011
#Test Images class mapping
#Folder where multiple test images are stored
test_class_folders = [f.path for f in os.scandir(test_images_path) if f.is_dir()]
test_image_classes = {} # Dictionary to store testing image: class mapping
test_images_path_root = test_images_path.resolve()
test_images_path_list = list(test_images_path_root.rglob("*.jpg"))
# Create a dictionary mapping image filenames to class names (parent folder)
test_image_classes = {img_path.name: img_path.parent.name for img_path in test_images_path_list}
# Define columns for the Testing DataFrame
columns_testing = ['Image_Path', 'labels']
# Create an empty DataFrame
df_testing = pd.DataFrame(columns=columns_testing)
df_testing = pd.DataFrame(test_images_path_list, columns=["Image_Path"])
df_testing["labels"] = df_testing["Image_Path"].apply(lambda x: Path(x).parent.name)
df_testing["Image_Path"] = df_testing["Image_Path"].apply(lambda x: str(Path(x).resolve()))
df_testing["Image_Path"] = df_testing["Image_Path"].astype(str)
print(df_testing.head(10))
print("Sample Testing Image to Class Mappings:")
count = 0
for img_name, class_label in list(test_image_classes.items())[:5]:
print(f"{img_name}: {class_label}")
Image_Path labels 0 /home/ec2-user/SageMaker/car_data/car_data/tes... Infiniti QX56 SUV 2011 1 /home/ec2-user/SageMaker/car_data/car_data/tes... Infiniti QX56 SUV 2011 2 /home/ec2-user/SageMaker/car_data/car_data/tes... Infiniti QX56 SUV 2011 3 /home/ec2-user/SageMaker/car_data/car_data/tes... Infiniti QX56 SUV 2011 4 /home/ec2-user/SageMaker/car_data/car_data/tes... Infiniti QX56 SUV 2011 5 /home/ec2-user/SageMaker/car_data/car_data/tes... Infiniti QX56 SUV 2011 6 /home/ec2-user/SageMaker/car_data/car_data/tes... Infiniti QX56 SUV 2011 7 /home/ec2-user/SageMaker/car_data/car_data/tes... Infiniti QX56 SUV 2011 8 /home/ec2-user/SageMaker/car_data/car_data/tes... Infiniti QX56 SUV 2011 9 /home/ec2-user/SageMaker/car_data/car_data/tes... Infiniti QX56 SUV 2011 Sample Testing Image to Class Mappings: 01068.jpg: Infiniti QX56 SUV 2011 02434.jpg: Infiniti QX56 SUV 2011 02499.jpg: Infiniti QX56 SUV 2011 04803.jpg: Infiniti QX56 SUV 2011 00478.jpg: Infiniti QX56 SUV 2011
4C. Data Handling - Map Images w.r.t Annotations
train_annotations_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 8144 entries, 0 to 8143 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 image_name 8144 non-null object 1 xmin 8144 non-null int64 2 ymin 8144 non-null int64 3 xmax 8144 non-null int64 4 ymax 8144 non-null int64 5 image_class 8144 non-null int64 dtypes: int64(5), object(1) memory usage: 381.9+ KB
# ********Definition of the method ********************************
def map_images_to_bboxes(annotations_file):
image_bboxes = {}
try:
for index, row in annotations_file.iterrows():
image_name = row['image_name']
x_min = row['xmin']
y_min = row['ymin']
x_max = row['xmax']
y_max = row['ymax']
image_class = row['image_class']
image_bboxes[image_name] = (x_min, y_min, x_max, y_max) # Store bbox as tuple
except FileNotFoundError:
print(f"Error: Annotation file not found: {annotations_file}")
except KeyError as e:
print(f"Error: Column '{e}' not found in CSV file. Check your CSV column names.")
print("Expected columns (example): filename, xmin, ymin, xmax, ymax") # Example expected columns
return image_bboxes
#Train images boundry box mapping
train_image_bboxes = map_images_to_bboxes(train_annotations_df)
# --- Print a few mappings to verify for Training images ---
print("\nSample Training Image to Bounding Box Mappings (DF):")
count = 0
for img_name, bbox in train_image_bboxes.items():
print(f"{img_name}: {bbox}")
count += 1
if count > 5: break
Sample Training Image to Bounding Box Mappings (DF): 00001.jpg: (39, 116, 569, 375) 00002.jpg: (36, 116, 868, 587) 00003.jpg: (85, 109, 601, 381) 00004.jpg: (621, 393, 1484, 1096) 00005.jpg: (14, 36, 133, 99) 00006.jpg: (259, 289, 515, 416)
#Test images boundry box mapping
test_image_bboxes = map_images_to_bboxes(test_annotations_df)
# --- Print a few mappings to verify testing images---
print("\nSample Testing Image to Bounding Box Mappings (DF):")
count = 0
for img_name, bbox in test_image_bboxes.items():
print(f"{img_name}: {bbox}")
count += 1
if count > 5: break
Sample Testing Image to Bounding Box Mappings (DF): 00001.jpg: (30, 52, 246, 147) 00002.jpg: (100, 19, 576, 203) 00003.jpg: (51, 105, 968, 659) 00004.jpg: (67, 84, 581, 407) 00005.jpg: (140, 151, 593, 339) 00006.jpg: (20, 77, 420, 301)
- Display Result - bounding box
# Display images with bounding boxes
def display_image_with_bbox(image_path, annotation):
# Load image
img = Image.open(image_path)
# Create plot
fig, ax = plt.subplots(1)
ax.imshow(img)
# Draw bounding box
x_min = row['xmin']
y_min = row['ymin']
x_max = row['xmax']
y_max = row['ymax']
image_class = row['image_class']
bbox = annotation['bbox']
rect = patches.Rectangle(
(x_min, y_min), # (x_min, y_min) - (bbox[0], bbox[1])
(x_max - x_min), # width (x_max - x_min) - bbox[2] - bbox[0]
(y_max - y_min), # height (y_max - y_min) -- bbox[3] - bbox[1]
linewidth=2,
edgecolor='r',
facecolor='none'
)
ax.add_patch(rect)
# Add class label
plt.text(
bbox[0], bbox[1] - 10, # Position of the label
annotation['image_class'],
color='red',
fontsize=12,
backgroundcolor='white'
)
plt.axis('off')
plt.show()
# Display bounding box for train images
print("For Training Images")
displayed_image_count = 0 # Initialize a counter to track displayed images
image_paths_details_training=[]
images_paths_details_testing=[]
for index, row in train_annotations_df.iterrows():
if displayed_image_count >= 5: # Check if we've already displayed two images
break # If yes, exit the loop
image_name = str(row['image_name']).strip()
image_path = None # Initialize image_path to None
for class_folder in train_class_folders:
potential_image_path = os.path.join(class_folder, image_name)
if os.path.exists(potential_image_path):
image_path = potential_image_path
image_paths_details_training.append(potential_image_path)
break # Image found, no need to check other class folders
if image_path: # If image_path is found (not None)
annotation = {
'bbox': [row['xmin'], row['ymin'], row['xmax'], row['ymax']],
'image_class' : row['image_class']
}
display_image_with_bbox(image_path, annotation)
displayed_image_count += 1 # Increment the counter
print(f"Displayed {displayed_image_count} training images with bounding boxes.")
For Training Images
Displayed 5 training images with bounding boxes.
# Display bounding box for test images
print("For Testing Images")
displayed_image_count_test = 0 # Initialize a counter to track displayed images
for index, row in test_annotations_df.iterrows(): # Use test_annotations_df DataFrame
if displayed_image_count_test >= 5: # Check if we've already displayed two images (adjust number here if you want 5 or more)
break # If yes, exit the loop
image_name_test = str(row['image_name']).strip()
image_path_test = None # Initialize image_path_test to None
for class_folder in test_class_folders: # Use test_class_folders
potential_image_path_test = os.path.join(class_folder, image_name_test)
if os.path.exists(potential_image_path_test):
image_path_test = potential_image_path_test # Assigned to image_path_test
images_paths_details_testing.append(potential_image_path)
break # Image found, no need to check other class folders
if image_path_test: # If image_path_test is found (not None)
annotation_test = {
'bbox': [row['xmin'], row['ymin'], row['xmax'], row['ymax']],
'image_class' : row['image_class'] # Assuming 'Image class' column also exists in test_annotations_df (verify!)
}
display_image_with_bbox(image_path_test, annotation_test) # Changed here
displayed_image_count_test += 1 # Increment the counter
print(f"Displayed {displayed_image_count_test} test images with bounding boxes.")
For Testing Images
Displayed 5 test images with bounding boxes.
- Design Basic CNN Models
The Models designed are:
- MobileNetV2
- GoogleNet
- AlexNet
- ResNet
def preprocess_image(image_path, target_size=(224, 224)):
"""
Load and preprocess an image for CNN input.
"""
# Check if the image file exists
if not os.path.exists(image_path):
print(f"Warning: Image file not found: {image_path}")
return None # Or handle the missing image in a way that makes sense for your application
image = cv2.imread(image_path) # Load image
# Check if image loading was successful
if image is None:
print(f"Warning: Failed to load image: {image_path}")
return None # Or handle the loading error as needed
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # Convert to RGB
image = cv2.resize(image, target_size) # Resize to target size
image = image / 255.0 # Normalize pixel values to [0, 1]
return image
def custom_generator(df, batch_size, target_size):
"""
Custom generator for images and labels.
"""
num_samples = len(df)
while True:
for offset in range(0, num_samples, batch_size):
batch_samples = df.iloc[offset:offset + batch_size]
images = []
labels = []
for _, row in batch_samples.iterrows():
image = preprocess_image(row['Image_Path'], target_size)
label = row['label_categorical']
images.append(image)
labels.append(label)
X = np.array(images, dtype=np.float32)
y = np.array(labels, dtype=np.float32)
yield X, y
# Apply preprocessing to all images
df_testing['image'] = df_testing['Image_Path'].apply(preprocess_image)
df_training['image'] = df_training['Image_Path'].apply(preprocess_image)
# Check for and handle None values in the 'image' column
df_testing = df_testing.dropna(subset=['image']) # Remove rows with None in 'image'
df_training = df_training.dropna(subset=['image']) # Remove rows with None in 'image'
# Encode labels
label_encoder = LabelEncoder()
df_testing['labels_encoded'] = label_encoder.fit_transform(df_testing['labels'])
df_training['labels_encoded'] = label_encoder.fit_transform(df_training['labels'])
# Convert labels to categorical (one-hot encoding)
df_testing['label_categorical'] = df_testing['labels_encoded'].apply(lambda x: to_categorical(x, num_classes=len(test_class_folders)))
df_training['label_categorical'] = df_training['labels_encoded'].apply(lambda x: to_categorical(x, num_classes=len(test_class_folders)))
# Split df_training into training and validation sets
df_train, df_val = train_test_split(df_training, test_size=0.2, random_state=42)
# Create generators
#batch_size = 32
batch_size = 16
train_generator = custom_generator(df_train, batch_size, target_size=(224, 224))
val_generator = custom_generator(df_val, batch_size, target_size=(224, 224)) # Use df_val for validation
# Test generator remains the same
test_generator = custom_generator(df_testing, batch_size, target_size=(224, 224))
# Check training generator
X_batch, y_batch = next(train_generator)
print("Training batch shape:", X_batch.shape, y_batch.shape)
# Check validation generator
X_batch, y_batch = next(val_generator)
print("Validation batch shape:", X_batch.shape, y_batch.shape)
Training batch shape: (16, 224, 224, 3) (16, 196) Validation batch shape: (16, 224, 224, 3) (16, 196)
#Generate classification report from a Keras/TensorFlow model using GPU-accelerated prediction.
#Assumes df_val['image'] contains pre-loaded images as np.arrays and df_val['label_categorical'] is one-hot encoded.
#returns y_val_pred, y_val_true: Predicted and true label indices
def generate_classification_report_tf_model(
model, #model
df_val, #val data frame
label_encoder, #label encoder
preprocess_fn, #preprocess_input
batch_size=32,
report_name="model_report.csv"
):
# Convert image and label columns to NumPy arrays
images = [img for img in df_val['image'] if img is not None]
labels = [label for label in df_val['label_categorical'] if label is not None]
images = np.stack(df_val['image'].values).astype(np.float32)
labels = np.stack(labels)
# Build tf.data.Dataset
dataset = tf.data.Dataset.from_tensor_slices((images, labels))
dataset = dataset.batch(batch_size).prefetch(tf.data.AUTOTUNE)
# Predict
preds = model.predict(dataset, verbose=1)
y_val_pred = np.argmax(preds, axis=1)
y_val_true = np.argmax(labels, axis=1)
# Evaluation
acc = accuracy_score(y_val_true, y_val_pred)
print(f"Model Accuracy: {acc:.4f}\n")
print("Classification Report:")
# Save as CSV
report = classification_report(
y_val_true, y_val_pred,
target_names=label_encoder.classes_,
output_dict=True,
zero_division=1
)
df_report = pd.DataFrame(report).transpose()
df_report.loc["overall_accuracy"] = [acc, None, None, None]
df_report.to_csv(report_name)
print(f"Report saved as: {report_name}")
# Print only the average metrics
print(f"Model Accuracy: {acc:.4f}")
print("Average Summary Metrics:")
print(df_report.loc[["macro avg", "weighted avg", "overall_accuracy"]][["precision", "recall", "f1-score"]])
return y_val_pred, y_val_true, df_report
def plot_training_history(history):
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
# Plot accuracy
ax1.plot(history.history['accuracy'], label='Training Accuracy')
ax1.plot(history.history['val_accuracy'], label='Validation Accuracy')
ax1.set_title('Model Accuracy')
ax1.set_xlabel('Epochs')
ax1.set_ylabel('Accuracy')
ax1.legend()
# Plot loss
ax2.plot(history.history['loss'], label='Training Loss')
ax2.plot(history.history['val_loss'], label='Validation Loss')
ax2.set_title('Model Loss')
ax2.set_xlabel('Epochs')
ax2.set_ylabel('Loss')
ax2.legend()
plt.show()
6A. MobileNetV2
base_model = MobileNetV2(weights='imagenet', include_top=False, input_shape=(224, 224, 3),classes=196) # Use 128x128 for speed
# Freeze all but last 4 layers for efficient training
for layer in base_model.layers[:-4]:
layer.trainable = False
# Add custom classification layers
x = base_model.output
x = GlobalAveragePooling2D()(x) # Reduces parameters
x = BatchNormalization()(x) # Stabilizes training
x = Dense(128, activation='relu')(x)
x = Dropout(0.3)(x) # Dropout for regularization
predictions = Dense(len(label_encoder.classes_), activation='softmax')(x) # Output layer
#Split 80-20 of train images
df_train_mobilenet, df_val_mobilenet = train_test_split(df_training, test_size=0.2, random_state=42)
mobilenet_batch_size=16
#df_train_mobilenet_gen = custom_generator(df_train_mobilenet,mobilenet_batch_size,target_size=(128,128))
#df_val_mobilenet_gen = custom_generator(df_val_mobilenet,mobilenet_batch_size,target_size=(128,128))
df_train_mobilenet_gen = custom_generator(df_train_mobilenet,mobilenet_batch_size,target_size=(224,224))
df_val_mobilenet_gen = custom_generator(df_val_mobilenet,mobilenet_batch_size,target_size=(224,224))
# Create the model
mobilenet_model = Model(inputs=base_model.input, outputs=predictions)
# Compile the model
mobilenet_model.compile(optimizer=Adam(learning_rate=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])
mobilenet_model.summary()
# Define steps per epoch
steps_per_epoch = np.ceil(len(df_train_mobilenet) / mobilenet_batch_size).astype(int)
validation_steps = np.ceil(len(df_val_mobilenet) / mobilenet_batch_size).astype(int)
y_true = np.array(df_train_mobilenet['labels_encoded'].tolist())
# Compute class weights based on actual class distribution
class_weights = compute_class_weight('balanced', classes=np.unique(y_true), y=y_true)
class_weight_dict = dict(enumerate(class_weights))
#predicting
history_mobilenet = mobilenet_model.fit(
df_train_mobilenet_gen,
steps_per_epoch=steps_per_epoch,
validation_data=df_val_mobilenet_gen,
validation_steps=validation_steps,
epochs=10 # Reduce epochs to speed up training
)
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 224, 224, 3)] 0 []
Conv1 (Conv2D) (None, 112, 112, 32) 864 ['input_1[0][0]']
bn_Conv1 (BatchNormalizati (None, 112, 112, 32) 128 ['Conv1[0][0]']
on)
Conv1_relu (ReLU) (None, 112, 112, 32) 0 ['bn_Conv1[0][0]']
expanded_conv_depthwise (D (None, 112, 112, 32) 288 ['Conv1_relu[0][0]']
epthwiseConv2D)
expanded_conv_depthwise_BN (None, 112, 112, 32) 128 ['expanded_conv_depthwise[0][0
(BatchNormalization) ]']
expanded_conv_depthwise_re (None, 112, 112, 32) 0 ['expanded_conv_depthwise_BN[0
lu (ReLU) ][0]']
expanded_conv_project (Con (None, 112, 112, 16) 512 ['expanded_conv_depthwise_relu
v2D) [0][0]']
expanded_conv_project_BN ( (None, 112, 112, 16) 64 ['expanded_conv_project[0][0]'
BatchNormalization) ]
block_1_expand (Conv2D) (None, 112, 112, 96) 1536 ['expanded_conv_project_BN[0][
0]']
block_1_expand_BN (BatchNo (None, 112, 112, 96) 384 ['block_1_expand[0][0]']
rmalization)
block_1_expand_relu (ReLU) (None, 112, 112, 96) 0 ['block_1_expand_BN[0][0]']
block_1_pad (ZeroPadding2D (None, 113, 113, 96) 0 ['block_1_expand_relu[0][0]']
)
block_1_depthwise (Depthwi (None, 56, 56, 96) 864 ['block_1_pad[0][0]']
seConv2D)
block_1_depthwise_BN (Batc (None, 56, 56, 96) 384 ['block_1_depthwise[0][0]']
hNormalization)
block_1_depthwise_relu (Re (None, 56, 56, 96) 0 ['block_1_depthwise_BN[0][0]']
LU)
block_1_project (Conv2D) (None, 56, 56, 24) 2304 ['block_1_depthwise_relu[0][0]
']
block_1_project_BN (BatchN (None, 56, 56, 24) 96 ['block_1_project[0][0]']
ormalization)
block_2_expand (Conv2D) (None, 56, 56, 144) 3456 ['block_1_project_BN[0][0]']
block_2_expand_BN (BatchNo (None, 56, 56, 144) 576 ['block_2_expand[0][0]']
rmalization)
block_2_expand_relu (ReLU) (None, 56, 56, 144) 0 ['block_2_expand_BN[0][0]']
block_2_depthwise (Depthwi (None, 56, 56, 144) 1296 ['block_2_expand_relu[0][0]']
seConv2D)
block_2_depthwise_BN (Batc (None, 56, 56, 144) 576 ['block_2_depthwise[0][0]']
hNormalization)
block_2_depthwise_relu (Re (None, 56, 56, 144) 0 ['block_2_depthwise_BN[0][0]']
LU)
block_2_project (Conv2D) (None, 56, 56, 24) 3456 ['block_2_depthwise_relu[0][0]
']
block_2_project_BN (BatchN (None, 56, 56, 24) 96 ['block_2_project[0][0]']
ormalization)
block_2_add (Add) (None, 56, 56, 24) 0 ['block_1_project_BN[0][0]',
'block_2_project_BN[0][0]']
block_3_expand (Conv2D) (None, 56, 56, 144) 3456 ['block_2_add[0][0]']
block_3_expand_BN (BatchNo (None, 56, 56, 144) 576 ['block_3_expand[0][0]']
rmalization)
block_3_expand_relu (ReLU) (None, 56, 56, 144) 0 ['block_3_expand_BN[0][0]']
block_3_pad (ZeroPadding2D (None, 57, 57, 144) 0 ['block_3_expand_relu[0][0]']
)
block_3_depthwise (Depthwi (None, 28, 28, 144) 1296 ['block_3_pad[0][0]']
seConv2D)
block_3_depthwise_BN (Batc (None, 28, 28, 144) 576 ['block_3_depthwise[0][0]']
hNormalization)
block_3_depthwise_relu (Re (None, 28, 28, 144) 0 ['block_3_depthwise_BN[0][0]']
LU)
block_3_project (Conv2D) (None, 28, 28, 32) 4608 ['block_3_depthwise_relu[0][0]
']
block_3_project_BN (BatchN (None, 28, 28, 32) 128 ['block_3_project[0][0]']
ormalization)
block_4_expand (Conv2D) (None, 28, 28, 192) 6144 ['block_3_project_BN[0][0]']
block_4_expand_BN (BatchNo (None, 28, 28, 192) 768 ['block_4_expand[0][0]']
rmalization)
block_4_expand_relu (ReLU) (None, 28, 28, 192) 0 ['block_4_expand_BN[0][0]']
block_4_depthwise (Depthwi (None, 28, 28, 192) 1728 ['block_4_expand_relu[0][0]']
seConv2D)
block_4_depthwise_BN (Batc (None, 28, 28, 192) 768 ['block_4_depthwise[0][0]']
hNormalization)
block_4_depthwise_relu (Re (None, 28, 28, 192) 0 ['block_4_depthwise_BN[0][0]']
LU)
block_4_project (Conv2D) (None, 28, 28, 32) 6144 ['block_4_depthwise_relu[0][0]
']
block_4_project_BN (BatchN (None, 28, 28, 32) 128 ['block_4_project[0][0]']
ormalization)
block_4_add (Add) (None, 28, 28, 32) 0 ['block_3_project_BN[0][0]',
'block_4_project_BN[0][0]']
block_5_expand (Conv2D) (None, 28, 28, 192) 6144 ['block_4_add[0][0]']
block_5_expand_BN (BatchNo (None, 28, 28, 192) 768 ['block_5_expand[0][0]']
rmalization)
block_5_expand_relu (ReLU) (None, 28, 28, 192) 0 ['block_5_expand_BN[0][0]']
block_5_depthwise (Depthwi (None, 28, 28, 192) 1728 ['block_5_expand_relu[0][0]']
seConv2D)
block_5_depthwise_BN (Batc (None, 28, 28, 192) 768 ['block_5_depthwise[0][0]']
hNormalization)
block_5_depthwise_relu (Re (None, 28, 28, 192) 0 ['block_5_depthwise_BN[0][0]']
LU)
block_5_project (Conv2D) (None, 28, 28, 32) 6144 ['block_5_depthwise_relu[0][0]
']
block_5_project_BN (BatchN (None, 28, 28, 32) 128 ['block_5_project[0][0]']
ormalization)
block_5_add (Add) (None, 28, 28, 32) 0 ['block_4_add[0][0]',
'block_5_project_BN[0][0]']
block_6_expand (Conv2D) (None, 28, 28, 192) 6144 ['block_5_add[0][0]']
block_6_expand_BN (BatchNo (None, 28, 28, 192) 768 ['block_6_expand[0][0]']
rmalization)
block_6_expand_relu (ReLU) (None, 28, 28, 192) 0 ['block_6_expand_BN[0][0]']
block_6_pad (ZeroPadding2D (None, 29, 29, 192) 0 ['block_6_expand_relu[0][0]']
)
block_6_depthwise (Depthwi (None, 14, 14, 192) 1728 ['block_6_pad[0][0]']
seConv2D)
block_6_depthwise_BN (Batc (None, 14, 14, 192) 768 ['block_6_depthwise[0][0]']
hNormalization)
block_6_depthwise_relu (Re (None, 14, 14, 192) 0 ['block_6_depthwise_BN[0][0]']
LU)
block_6_project (Conv2D) (None, 14, 14, 64) 12288 ['block_6_depthwise_relu[0][0]
']
block_6_project_BN (BatchN (None, 14, 14, 64) 256 ['block_6_project[0][0]']
ormalization)
block_7_expand (Conv2D) (None, 14, 14, 384) 24576 ['block_6_project_BN[0][0]']
block_7_expand_BN (BatchNo (None, 14, 14, 384) 1536 ['block_7_expand[0][0]']
rmalization)
block_7_expand_relu (ReLU) (None, 14, 14, 384) 0 ['block_7_expand_BN[0][0]']
block_7_depthwise (Depthwi (None, 14, 14, 384) 3456 ['block_7_expand_relu[0][0]']
seConv2D)
block_7_depthwise_BN (Batc (None, 14, 14, 384) 1536 ['block_7_depthwise[0][0]']
hNormalization)
block_7_depthwise_relu (Re (None, 14, 14, 384) 0 ['block_7_depthwise_BN[0][0]']
LU)
block_7_project (Conv2D) (None, 14, 14, 64) 24576 ['block_7_depthwise_relu[0][0]
']
block_7_project_BN (BatchN (None, 14, 14, 64) 256 ['block_7_project[0][0]']
ormalization)
block_7_add (Add) (None, 14, 14, 64) 0 ['block_6_project_BN[0][0]',
'block_7_project_BN[0][0]']
block_8_expand (Conv2D) (None, 14, 14, 384) 24576 ['block_7_add[0][0]']
block_8_expand_BN (BatchNo (None, 14, 14, 384) 1536 ['block_8_expand[0][0]']
rmalization)
block_8_expand_relu (ReLU) (None, 14, 14, 384) 0 ['block_8_expand_BN[0][0]']
block_8_depthwise (Depthwi (None, 14, 14, 384) 3456 ['block_8_expand_relu[0][0]']
seConv2D)
block_8_depthwise_BN (Batc (None, 14, 14, 384) 1536 ['block_8_depthwise[0][0]']
hNormalization)
block_8_depthwise_relu (Re (None, 14, 14, 384) 0 ['block_8_depthwise_BN[0][0]']
LU)
block_8_project (Conv2D) (None, 14, 14, 64) 24576 ['block_8_depthwise_relu[0][0]
']
block_8_project_BN (BatchN (None, 14, 14, 64) 256 ['block_8_project[0][0]']
ormalization)
block_8_add (Add) (None, 14, 14, 64) 0 ['block_7_add[0][0]',
'block_8_project_BN[0][0]']
block_9_expand (Conv2D) (None, 14, 14, 384) 24576 ['block_8_add[0][0]']
block_9_expand_BN (BatchNo (None, 14, 14, 384) 1536 ['block_9_expand[0][0]']
rmalization)
block_9_expand_relu (ReLU) (None, 14, 14, 384) 0 ['block_9_expand_BN[0][0]']
block_9_depthwise (Depthwi (None, 14, 14, 384) 3456 ['block_9_expand_relu[0][0]']
seConv2D)
block_9_depthwise_BN (Batc (None, 14, 14, 384) 1536 ['block_9_depthwise[0][0]']
hNormalization)
block_9_depthwise_relu (Re (None, 14, 14, 384) 0 ['block_9_depthwise_BN[0][0]']
LU)
block_9_project (Conv2D) (None, 14, 14, 64) 24576 ['block_9_depthwise_relu[0][0]
']
block_9_project_BN (BatchN (None, 14, 14, 64) 256 ['block_9_project[0][0]']
ormalization)
block_9_add (Add) (None, 14, 14, 64) 0 ['block_8_add[0][0]',
'block_9_project_BN[0][0]']
block_10_expand (Conv2D) (None, 14, 14, 384) 24576 ['block_9_add[0][0]']
block_10_expand_BN (BatchN (None, 14, 14, 384) 1536 ['block_10_expand[0][0]']
ormalization)
block_10_expand_relu (ReLU (None, 14, 14, 384) 0 ['block_10_expand_BN[0][0]']
)
block_10_depthwise (Depthw (None, 14, 14, 384) 3456 ['block_10_expand_relu[0][0]']
iseConv2D)
block_10_depthwise_BN (Bat (None, 14, 14, 384) 1536 ['block_10_depthwise[0][0]']
chNormalization)
block_10_depthwise_relu (R (None, 14, 14, 384) 0 ['block_10_depthwise_BN[0][0]'
eLU) ]
block_10_project (Conv2D) (None, 14, 14, 96) 36864 ['block_10_depthwise_relu[0][0
]']
block_10_project_BN (Batch (None, 14, 14, 96) 384 ['block_10_project[0][0]']
Normalization)
block_11_expand (Conv2D) (None, 14, 14, 576) 55296 ['block_10_project_BN[0][0]']
block_11_expand_BN (BatchN (None, 14, 14, 576) 2304 ['block_11_expand[0][0]']
ormalization)
block_11_expand_relu (ReLU (None, 14, 14, 576) 0 ['block_11_expand_BN[0][0]']
)
block_11_depthwise (Depthw (None, 14, 14, 576) 5184 ['block_11_expand_relu[0][0]']
iseConv2D)
block_11_depthwise_BN (Bat (None, 14, 14, 576) 2304 ['block_11_depthwise[0][0]']
chNormalization)
block_11_depthwise_relu (R (None, 14, 14, 576) 0 ['block_11_depthwise_BN[0][0]'
eLU) ]
block_11_project (Conv2D) (None, 14, 14, 96) 55296 ['block_11_depthwise_relu[0][0
]']
block_11_project_BN (Batch (None, 14, 14, 96) 384 ['block_11_project[0][0]']
Normalization)
block_11_add (Add) (None, 14, 14, 96) 0 ['block_10_project_BN[0][0]',
'block_11_project_BN[0][0]']
block_12_expand (Conv2D) (None, 14, 14, 576) 55296 ['block_11_add[0][0]']
block_12_expand_BN (BatchN (None, 14, 14, 576) 2304 ['block_12_expand[0][0]']
ormalization)
block_12_expand_relu (ReLU (None, 14, 14, 576) 0 ['block_12_expand_BN[0][0]']
)
block_12_depthwise (Depthw (None, 14, 14, 576) 5184 ['block_12_expand_relu[0][0]']
iseConv2D)
block_12_depthwise_BN (Bat (None, 14, 14, 576) 2304 ['block_12_depthwise[0][0]']
chNormalization)
block_12_depthwise_relu (R (None, 14, 14, 576) 0 ['block_12_depthwise_BN[0][0]'
eLU) ]
block_12_project (Conv2D) (None, 14, 14, 96) 55296 ['block_12_depthwise_relu[0][0
]']
block_12_project_BN (Batch (None, 14, 14, 96) 384 ['block_12_project[0][0]']
Normalization)
block_12_add (Add) (None, 14, 14, 96) 0 ['block_11_add[0][0]',
'block_12_project_BN[0][0]']
block_13_expand (Conv2D) (None, 14, 14, 576) 55296 ['block_12_add[0][0]']
block_13_expand_BN (BatchN (None, 14, 14, 576) 2304 ['block_13_expand[0][0]']
ormalization)
block_13_expand_relu (ReLU (None, 14, 14, 576) 0 ['block_13_expand_BN[0][0]']
)
block_13_pad (ZeroPadding2 (None, 15, 15, 576) 0 ['block_13_expand_relu[0][0]']
D)
block_13_depthwise (Depthw (None, 7, 7, 576) 5184 ['block_13_pad[0][0]']
iseConv2D)
block_13_depthwise_BN (Bat (None, 7, 7, 576) 2304 ['block_13_depthwise[0][0]']
chNormalization)
block_13_depthwise_relu (R (None, 7, 7, 576) 0 ['block_13_depthwise_BN[0][0]'
eLU) ]
block_13_project (Conv2D) (None, 7, 7, 160) 92160 ['block_13_depthwise_relu[0][0
]']
block_13_project_BN (Batch (None, 7, 7, 160) 640 ['block_13_project[0][0]']
Normalization)
block_14_expand (Conv2D) (None, 7, 7, 960) 153600 ['block_13_project_BN[0][0]']
block_14_expand_BN (BatchN (None, 7, 7, 960) 3840 ['block_14_expand[0][0]']
ormalization)
block_14_expand_relu (ReLU (None, 7, 7, 960) 0 ['block_14_expand_BN[0][0]']
)
block_14_depthwise (Depthw (None, 7, 7, 960) 8640 ['block_14_expand_relu[0][0]']
iseConv2D)
block_14_depthwise_BN (Bat (None, 7, 7, 960) 3840 ['block_14_depthwise[0][0]']
chNormalization)
block_14_depthwise_relu (R (None, 7, 7, 960) 0 ['block_14_depthwise_BN[0][0]'
eLU) ]
block_14_project (Conv2D) (None, 7, 7, 160) 153600 ['block_14_depthwise_relu[0][0
]']
block_14_project_BN (Batch (None, 7, 7, 160) 640 ['block_14_project[0][0]']
Normalization)
block_14_add (Add) (None, 7, 7, 160) 0 ['block_13_project_BN[0][0]',
'block_14_project_BN[0][0]']
block_15_expand (Conv2D) (None, 7, 7, 960) 153600 ['block_14_add[0][0]']
block_15_expand_BN (BatchN (None, 7, 7, 960) 3840 ['block_15_expand[0][0]']
ormalization)
block_15_expand_relu (ReLU (None, 7, 7, 960) 0 ['block_15_expand_BN[0][0]']
)
block_15_depthwise (Depthw (None, 7, 7, 960) 8640 ['block_15_expand_relu[0][0]']
iseConv2D)
block_15_depthwise_BN (Bat (None, 7, 7, 960) 3840 ['block_15_depthwise[0][0]']
chNormalization)
block_15_depthwise_relu (R (None, 7, 7, 960) 0 ['block_15_depthwise_BN[0][0]'
eLU) ]
block_15_project (Conv2D) (None, 7, 7, 160) 153600 ['block_15_depthwise_relu[0][0
]']
block_15_project_BN (Batch (None, 7, 7, 160) 640 ['block_15_project[0][0]']
Normalization)
block_15_add (Add) (None, 7, 7, 160) 0 ['block_14_add[0][0]',
'block_15_project_BN[0][0]']
block_16_expand (Conv2D) (None, 7, 7, 960) 153600 ['block_15_add[0][0]']
block_16_expand_BN (BatchN (None, 7, 7, 960) 3840 ['block_16_expand[0][0]']
ormalization)
block_16_expand_relu (ReLU (None, 7, 7, 960) 0 ['block_16_expand_BN[0][0]']
)
block_16_depthwise (Depthw (None, 7, 7, 960) 8640 ['block_16_expand_relu[0][0]']
iseConv2D)
block_16_depthwise_BN (Bat (None, 7, 7, 960) 3840 ['block_16_depthwise[0][0]']
chNormalization)
block_16_depthwise_relu (R (None, 7, 7, 960) 0 ['block_16_depthwise_BN[0][0]'
eLU) ]
block_16_project (Conv2D) (None, 7, 7, 320) 307200 ['block_16_depthwise_relu[0][0
]']
block_16_project_BN (Batch (None, 7, 7, 320) 1280 ['block_16_project[0][0]']
Normalization)
Conv_1 (Conv2D) (None, 7, 7, 1280) 409600 ['block_16_project_BN[0][0]']
Conv_1_bn (BatchNormalizat (None, 7, 7, 1280) 5120 ['Conv_1[0][0]']
ion)
out_relu (ReLU) (None, 7, 7, 1280) 0 ['Conv_1_bn[0][0]']
global_average_pooling2d ( (None, 1280) 0 ['out_relu[0][0]']
GlobalAveragePooling2D)
batch_normalization (Batch (None, 1280) 5120 ['global_average_pooling2d[0][
Normalization) 0]']
dense (Dense) (None, 128) 163968 ['batch_normalization[0][0]']
dropout (Dropout) (None, 128) 0 ['dense[0][0]']
dense_1 (Dense) (None, 196) 25284 ['dropout[0][0]']
==================================================================================================
Total params: 2452356 (9.35 MB)
Trainable params: 604612 (2.31 MB)
Non-trainable params: 1847744 (7.05 MB)
__________________________________________________________________________________________________
Epoch 1/10
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1743244946.072535 26539 service.cc:145] XLA service 0x7ff2f07d5ab0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: I0000 00:00:1743244946.072576 26539 service.cc:153] StreamExecutor device (0): NVIDIA A10G, Compute Capability 8.6 I0000 00:00:1743244946.144123 26539 device_compiler.h:188] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
408/408 [==============================] - 55s 77ms/step - loss: 5.3014 - accuracy: 0.0173 - val_loss: 4.8354 - val_accuracy: 0.0473 Epoch 2/10 408/408 [==============================] - 27s 65ms/step - loss: 4.3497 - accuracy: 0.1039 - val_loss: 4.3516 - val_accuracy: 0.1025 Epoch 3/10 408/408 [==============================] - 26s 64ms/step - loss: 3.6916 - accuracy: 0.2080 - val_loss: 3.9913 - val_accuracy: 0.1572 Epoch 4/10 408/408 [==============================] - 26s 64ms/step - loss: 3.1578 - accuracy: 0.3101 - val_loss: 3.6960 - val_accuracy: 0.1940 Epoch 5/10 408/408 [==============================] - 25s 62ms/step - loss: 2.7185 - accuracy: 0.4051 - val_loss: 3.4833 - val_accuracy: 0.2253 Epoch 6/10 408/408 [==============================] - 25s 61ms/step - loss: 2.3319 - accuracy: 0.4890 - val_loss: 3.3144 - val_accuracy: 0.2597 Epoch 7/10 408/408 [==============================] - 23s 57ms/step - loss: 1.9890 - accuracy: 0.5653 - val_loss: 3.1895 - val_accuracy: 0.2683 Epoch 8/10 408/408 [==============================] - 21s 52ms/step - loss: 1.7209 - accuracy: 0.6278 - val_loss: 3.0856 - val_accuracy: 0.2799 Epoch 9/10 408/408 [==============================] - 21s 52ms/step - loss: 1.4624 - accuracy: 0.6921 - val_loss: 3.0096 - val_accuracy: 0.2977 Epoch 10/10 408/408 [==============================] - 21s 52ms/step - loss: 1.2639 - accuracy: 0.7383 - val_loss: 2.9323 - val_accuracy: 0.3082
#display model accurance and loss
plot_training_history(history_mobilenet)
y_pred, y_true,df_mobilenet_classification_report = generate_classification_report_tf_model(
model=mobilenet_model,
df_val=df_val_mobilenet,
label_encoder=label_encoder,
preprocess_fn=mobilenet_preprocess,
batch_size=32,
report_name="mobilenet_classification_report.csv"
)
51/51 [==============================] - 5s 20ms/step
Model Accuracy: 0.3088
Classification Report:
Report saved as: mobilenet_classification_report.csv
Model Accuracy: 0.3088
Average Summary Metrics:
precision recall f1-score
macro avg 0.321495 0.312861 0.295274
weighted avg 0.346274 0.308778 0.305494
overall_accuracy 0.308778 NaN NaN
Displaying only top 10 class names
df_support = df_mobilenet_classification_report.iloc[:-3] # exclude average rows
top_10_classes = df_support.sort_values("support", ascending=False).head(10).index.tolist()
top_10_indices = [np.where(label_encoder.classes_ == cls)[0][0] for cls in top_10_classes]
mobilenet_cm = confusion_matrix(y_true, y_pred)
mobilenet_cm_top10 = mobilenet_cm[np.ix_(top_10_indices, top_10_indices)]
plt.figure(figsize=(10, 8))
sns.heatmap(mobilenet_cm_top10, annot=True, fmt='d',
xticklabels=top_10_classes, yticklabels=top_10_classes,
cmap='Blues')
plt.title("Mobile Net Confusion Matrix (Top 10 Classes)")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.tight_layout()
plt.show()
MobileNet Model Summary¶
Poor Generalization of The Model
Based on the Classification report most cases are not predicted correctly.
Indications of Database Imbalance is present
lightweight model for real-time applications on mobile devices but here car classification and labeling needed
Next Steps
handle data imbalance
parameter tuning for to enhance performance
6B. GoogleNet
base_model = InceptionV3(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
# Add custom layers on top of the base model
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(196, activation='softmax')(x)
# Define the complete model
googlenet_model = Model(inputs=base_model.input, outputs=predictions)
# Freeze the layers of the base model
for layer in base_model.layers:
layer.trainable = False
# Compile the model
googlenet_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
googlenet_model.summary()
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_2 (InputLayer) [(None, 224, 224, 3)] 0 []
conv2d (Conv2D) (None, 111, 111, 32) 864 ['input_2[0][0]']
batch_normalization_1 (Bat (None, 111, 111, 32) 96 ['conv2d[0][0]']
chNormalization)
activation (Activation) (None, 111, 111, 32) 0 ['batch_normalization_1[0][0]'
]
conv2d_1 (Conv2D) (None, 109, 109, 32) 9216 ['activation[0][0]']
batch_normalization_2 (Bat (None, 109, 109, 32) 96 ['conv2d_1[0][0]']
chNormalization)
activation_1 (Activation) (None, 109, 109, 32) 0 ['batch_normalization_2[0][0]'
]
conv2d_2 (Conv2D) (None, 109, 109, 64) 18432 ['activation_1[0][0]']
batch_normalization_3 (Bat (None, 109, 109, 64) 192 ['conv2d_2[0][0]']
chNormalization)
activation_2 (Activation) (None, 109, 109, 64) 0 ['batch_normalization_3[0][0]'
]
max_pooling2d (MaxPooling2 (None, 54, 54, 64) 0 ['activation_2[0][0]']
D)
conv2d_3 (Conv2D) (None, 54, 54, 80) 5120 ['max_pooling2d[0][0]']
batch_normalization_4 (Bat (None, 54, 54, 80) 240 ['conv2d_3[0][0]']
chNormalization)
activation_3 (Activation) (None, 54, 54, 80) 0 ['batch_normalization_4[0][0]'
]
conv2d_4 (Conv2D) (None, 52, 52, 192) 138240 ['activation_3[0][0]']
batch_normalization_5 (Bat (None, 52, 52, 192) 576 ['conv2d_4[0][0]']
chNormalization)
activation_4 (Activation) (None, 52, 52, 192) 0 ['batch_normalization_5[0][0]'
]
max_pooling2d_1 (MaxPoolin (None, 25, 25, 192) 0 ['activation_4[0][0]']
g2D)
conv2d_8 (Conv2D) (None, 25, 25, 64) 12288 ['max_pooling2d_1[0][0]']
batch_normalization_9 (Bat (None, 25, 25, 64) 192 ['conv2d_8[0][0]']
chNormalization)
activation_8 (Activation) (None, 25, 25, 64) 0 ['batch_normalization_9[0][0]'
]
conv2d_6 (Conv2D) (None, 25, 25, 48) 9216 ['max_pooling2d_1[0][0]']
conv2d_9 (Conv2D) (None, 25, 25, 96) 55296 ['activation_8[0][0]']
batch_normalization_7 (Bat (None, 25, 25, 48) 144 ['conv2d_6[0][0]']
chNormalization)
batch_normalization_10 (Ba (None, 25, 25, 96) 288 ['conv2d_9[0][0]']
tchNormalization)
activation_6 (Activation) (None, 25, 25, 48) 0 ['batch_normalization_7[0][0]'
]
activation_9 (Activation) (None, 25, 25, 96) 0 ['batch_normalization_10[0][0]
']
average_pooling2d (Average (None, 25, 25, 192) 0 ['max_pooling2d_1[0][0]']
Pooling2D)
conv2d_5 (Conv2D) (None, 25, 25, 64) 12288 ['max_pooling2d_1[0][0]']
conv2d_7 (Conv2D) (None, 25, 25, 64) 76800 ['activation_6[0][0]']
conv2d_10 (Conv2D) (None, 25, 25, 96) 82944 ['activation_9[0][0]']
conv2d_11 (Conv2D) (None, 25, 25, 32) 6144 ['average_pooling2d[0][0]']
batch_normalization_6 (Bat (None, 25, 25, 64) 192 ['conv2d_5[0][0]']
chNormalization)
batch_normalization_8 (Bat (None, 25, 25, 64) 192 ['conv2d_7[0][0]']
chNormalization)
batch_normalization_11 (Ba (None, 25, 25, 96) 288 ['conv2d_10[0][0]']
tchNormalization)
batch_normalization_12 (Ba (None, 25, 25, 32) 96 ['conv2d_11[0][0]']
tchNormalization)
activation_5 (Activation) (None, 25, 25, 64) 0 ['batch_normalization_6[0][0]'
]
activation_7 (Activation) (None, 25, 25, 64) 0 ['batch_normalization_8[0][0]'
]
activation_10 (Activation) (None, 25, 25, 96) 0 ['batch_normalization_11[0][0]
']
activation_11 (Activation) (None, 25, 25, 32) 0 ['batch_normalization_12[0][0]
']
mixed0 (Concatenate) (None, 25, 25, 256) 0 ['activation_5[0][0]',
'activation_7[0][0]',
'activation_10[0][0]',
'activation_11[0][0]']
conv2d_15 (Conv2D) (None, 25, 25, 64) 16384 ['mixed0[0][0]']
batch_normalization_16 (Ba (None, 25, 25, 64) 192 ['conv2d_15[0][0]']
tchNormalization)
activation_15 (Activation) (None, 25, 25, 64) 0 ['batch_normalization_16[0][0]
']
conv2d_13 (Conv2D) (None, 25, 25, 48) 12288 ['mixed0[0][0]']
conv2d_16 (Conv2D) (None, 25, 25, 96) 55296 ['activation_15[0][0]']
batch_normalization_14 (Ba (None, 25, 25, 48) 144 ['conv2d_13[0][0]']
tchNormalization)
batch_normalization_17 (Ba (None, 25, 25, 96) 288 ['conv2d_16[0][0]']
tchNormalization)
activation_13 (Activation) (None, 25, 25, 48) 0 ['batch_normalization_14[0][0]
']
activation_16 (Activation) (None, 25, 25, 96) 0 ['batch_normalization_17[0][0]
']
average_pooling2d_1 (Avera (None, 25, 25, 256) 0 ['mixed0[0][0]']
gePooling2D)
conv2d_12 (Conv2D) (None, 25, 25, 64) 16384 ['mixed0[0][0]']
conv2d_14 (Conv2D) (None, 25, 25, 64) 76800 ['activation_13[0][0]']
conv2d_17 (Conv2D) (None, 25, 25, 96) 82944 ['activation_16[0][0]']
conv2d_18 (Conv2D) (None, 25, 25, 64) 16384 ['average_pooling2d_1[0][0]']
batch_normalization_13 (Ba (None, 25, 25, 64) 192 ['conv2d_12[0][0]']
tchNormalization)
batch_normalization_15 (Ba (None, 25, 25, 64) 192 ['conv2d_14[0][0]']
tchNormalization)
batch_normalization_18 (Ba (None, 25, 25, 96) 288 ['conv2d_17[0][0]']
tchNormalization)
batch_normalization_19 (Ba (None, 25, 25, 64) 192 ['conv2d_18[0][0]']
tchNormalization)
activation_12 (Activation) (None, 25, 25, 64) 0 ['batch_normalization_13[0][0]
']
activation_14 (Activation) (None, 25, 25, 64) 0 ['batch_normalization_15[0][0]
']
activation_17 (Activation) (None, 25, 25, 96) 0 ['batch_normalization_18[0][0]
']
activation_18 (Activation) (None, 25, 25, 64) 0 ['batch_normalization_19[0][0]
']
mixed1 (Concatenate) (None, 25, 25, 288) 0 ['activation_12[0][0]',
'activation_14[0][0]',
'activation_17[0][0]',
'activation_18[0][0]']
conv2d_22 (Conv2D) (None, 25, 25, 64) 18432 ['mixed1[0][0]']
batch_normalization_23 (Ba (None, 25, 25, 64) 192 ['conv2d_22[0][0]']
tchNormalization)
activation_22 (Activation) (None, 25, 25, 64) 0 ['batch_normalization_23[0][0]
']
conv2d_20 (Conv2D) (None, 25, 25, 48) 13824 ['mixed1[0][0]']
conv2d_23 (Conv2D) (None, 25, 25, 96) 55296 ['activation_22[0][0]']
batch_normalization_21 (Ba (None, 25, 25, 48) 144 ['conv2d_20[0][0]']
tchNormalization)
batch_normalization_24 (Ba (None, 25, 25, 96) 288 ['conv2d_23[0][0]']
tchNormalization)
activation_20 (Activation) (None, 25, 25, 48) 0 ['batch_normalization_21[0][0]
']
activation_23 (Activation) (None, 25, 25, 96) 0 ['batch_normalization_24[0][0]
']
average_pooling2d_2 (Avera (None, 25, 25, 288) 0 ['mixed1[0][0]']
gePooling2D)
conv2d_19 (Conv2D) (None, 25, 25, 64) 18432 ['mixed1[0][0]']
conv2d_21 (Conv2D) (None, 25, 25, 64) 76800 ['activation_20[0][0]']
conv2d_24 (Conv2D) (None, 25, 25, 96) 82944 ['activation_23[0][0]']
conv2d_25 (Conv2D) (None, 25, 25, 64) 18432 ['average_pooling2d_2[0][0]']
batch_normalization_20 (Ba (None, 25, 25, 64) 192 ['conv2d_19[0][0]']
tchNormalization)
batch_normalization_22 (Ba (None, 25, 25, 64) 192 ['conv2d_21[0][0]']
tchNormalization)
batch_normalization_25 (Ba (None, 25, 25, 96) 288 ['conv2d_24[0][0]']
tchNormalization)
batch_normalization_26 (Ba (None, 25, 25, 64) 192 ['conv2d_25[0][0]']
tchNormalization)
activation_19 (Activation) (None, 25, 25, 64) 0 ['batch_normalization_20[0][0]
']
activation_21 (Activation) (None, 25, 25, 64) 0 ['batch_normalization_22[0][0]
']
activation_24 (Activation) (None, 25, 25, 96) 0 ['batch_normalization_25[0][0]
']
activation_25 (Activation) (None, 25, 25, 64) 0 ['batch_normalization_26[0][0]
']
mixed2 (Concatenate) (None, 25, 25, 288) 0 ['activation_19[0][0]',
'activation_21[0][0]',
'activation_24[0][0]',
'activation_25[0][0]']
conv2d_27 (Conv2D) (None, 25, 25, 64) 18432 ['mixed2[0][0]']
batch_normalization_28 (Ba (None, 25, 25, 64) 192 ['conv2d_27[0][0]']
tchNormalization)
activation_27 (Activation) (None, 25, 25, 64) 0 ['batch_normalization_28[0][0]
']
conv2d_28 (Conv2D) (None, 25, 25, 96) 55296 ['activation_27[0][0]']
batch_normalization_29 (Ba (None, 25, 25, 96) 288 ['conv2d_28[0][0]']
tchNormalization)
activation_28 (Activation) (None, 25, 25, 96) 0 ['batch_normalization_29[0][0]
']
conv2d_26 (Conv2D) (None, 12, 12, 384) 995328 ['mixed2[0][0]']
conv2d_29 (Conv2D) (None, 12, 12, 96) 82944 ['activation_28[0][0]']
batch_normalization_27 (Ba (None, 12, 12, 384) 1152 ['conv2d_26[0][0]']
tchNormalization)
batch_normalization_30 (Ba (None, 12, 12, 96) 288 ['conv2d_29[0][0]']
tchNormalization)
activation_26 (Activation) (None, 12, 12, 384) 0 ['batch_normalization_27[0][0]
']
activation_29 (Activation) (None, 12, 12, 96) 0 ['batch_normalization_30[0][0]
']
max_pooling2d_2 (MaxPoolin (None, 12, 12, 288) 0 ['mixed2[0][0]']
g2D)
mixed3 (Concatenate) (None, 12, 12, 768) 0 ['activation_26[0][0]',
'activation_29[0][0]',
'max_pooling2d_2[0][0]']
conv2d_34 (Conv2D) (None, 12, 12, 128) 98304 ['mixed3[0][0]']
batch_normalization_35 (Ba (None, 12, 12, 128) 384 ['conv2d_34[0][0]']
tchNormalization)
activation_34 (Activation) (None, 12, 12, 128) 0 ['batch_normalization_35[0][0]
']
conv2d_35 (Conv2D) (None, 12, 12, 128) 114688 ['activation_34[0][0]']
batch_normalization_36 (Ba (None, 12, 12, 128) 384 ['conv2d_35[0][0]']
tchNormalization)
activation_35 (Activation) (None, 12, 12, 128) 0 ['batch_normalization_36[0][0]
']
conv2d_31 (Conv2D) (None, 12, 12, 128) 98304 ['mixed3[0][0]']
conv2d_36 (Conv2D) (None, 12, 12, 128) 114688 ['activation_35[0][0]']
batch_normalization_32 (Ba (None, 12, 12, 128) 384 ['conv2d_31[0][0]']
tchNormalization)
batch_normalization_37 (Ba (None, 12, 12, 128) 384 ['conv2d_36[0][0]']
tchNormalization)
activation_31 (Activation) (None, 12, 12, 128) 0 ['batch_normalization_32[0][0]
']
activation_36 (Activation) (None, 12, 12, 128) 0 ['batch_normalization_37[0][0]
']
conv2d_32 (Conv2D) (None, 12, 12, 128) 114688 ['activation_31[0][0]']
conv2d_37 (Conv2D) (None, 12, 12, 128) 114688 ['activation_36[0][0]']
batch_normalization_33 (Ba (None, 12, 12, 128) 384 ['conv2d_32[0][0]']
tchNormalization)
batch_normalization_38 (Ba (None, 12, 12, 128) 384 ['conv2d_37[0][0]']
tchNormalization)
activation_32 (Activation) (None, 12, 12, 128) 0 ['batch_normalization_33[0][0]
']
activation_37 (Activation) (None, 12, 12, 128) 0 ['batch_normalization_38[0][0]
']
average_pooling2d_3 (Avera (None, 12, 12, 768) 0 ['mixed3[0][0]']
gePooling2D)
conv2d_30 (Conv2D) (None, 12, 12, 192) 147456 ['mixed3[0][0]']
conv2d_33 (Conv2D) (None, 12, 12, 192) 172032 ['activation_32[0][0]']
conv2d_38 (Conv2D) (None, 12, 12, 192) 172032 ['activation_37[0][0]']
conv2d_39 (Conv2D) (None, 12, 12, 192) 147456 ['average_pooling2d_3[0][0]']
batch_normalization_31 (Ba (None, 12, 12, 192) 576 ['conv2d_30[0][0]']
tchNormalization)
batch_normalization_34 (Ba (None, 12, 12, 192) 576 ['conv2d_33[0][0]']
tchNormalization)
batch_normalization_39 (Ba (None, 12, 12, 192) 576 ['conv2d_38[0][0]']
tchNormalization)
batch_normalization_40 (Ba (None, 12, 12, 192) 576 ['conv2d_39[0][0]']
tchNormalization)
activation_30 (Activation) (None, 12, 12, 192) 0 ['batch_normalization_31[0][0]
']
activation_33 (Activation) (None, 12, 12, 192) 0 ['batch_normalization_34[0][0]
']
activation_38 (Activation) (None, 12, 12, 192) 0 ['batch_normalization_39[0][0]
']
activation_39 (Activation) (None, 12, 12, 192) 0 ['batch_normalization_40[0][0]
']
mixed4 (Concatenate) (None, 12, 12, 768) 0 ['activation_30[0][0]',
'activation_33[0][0]',
'activation_38[0][0]',
'activation_39[0][0]']
conv2d_44 (Conv2D) (None, 12, 12, 160) 122880 ['mixed4[0][0]']
batch_normalization_45 (Ba (None, 12, 12, 160) 480 ['conv2d_44[0][0]']
tchNormalization)
activation_44 (Activation) (None, 12, 12, 160) 0 ['batch_normalization_45[0][0]
']
conv2d_45 (Conv2D) (None, 12, 12, 160) 179200 ['activation_44[0][0]']
batch_normalization_46 (Ba (None, 12, 12, 160) 480 ['conv2d_45[0][0]']
tchNormalization)
activation_45 (Activation) (None, 12, 12, 160) 0 ['batch_normalization_46[0][0]
']
conv2d_41 (Conv2D) (None, 12, 12, 160) 122880 ['mixed4[0][0]']
conv2d_46 (Conv2D) (None, 12, 12, 160) 179200 ['activation_45[0][0]']
batch_normalization_42 (Ba (None, 12, 12, 160) 480 ['conv2d_41[0][0]']
tchNormalization)
batch_normalization_47 (Ba (None, 12, 12, 160) 480 ['conv2d_46[0][0]']
tchNormalization)
activation_41 (Activation) (None, 12, 12, 160) 0 ['batch_normalization_42[0][0]
']
activation_46 (Activation) (None, 12, 12, 160) 0 ['batch_normalization_47[0][0]
']
conv2d_42 (Conv2D) (None, 12, 12, 160) 179200 ['activation_41[0][0]']
conv2d_47 (Conv2D) (None, 12, 12, 160) 179200 ['activation_46[0][0]']
batch_normalization_43 (Ba (None, 12, 12, 160) 480 ['conv2d_42[0][0]']
tchNormalization)
batch_normalization_48 (Ba (None, 12, 12, 160) 480 ['conv2d_47[0][0]']
tchNormalization)
activation_42 (Activation) (None, 12, 12, 160) 0 ['batch_normalization_43[0][0]
']
activation_47 (Activation) (None, 12, 12, 160) 0 ['batch_normalization_48[0][0]
']
average_pooling2d_4 (Avera (None, 12, 12, 768) 0 ['mixed4[0][0]']
gePooling2D)
conv2d_40 (Conv2D) (None, 12, 12, 192) 147456 ['mixed4[0][0]']
conv2d_43 (Conv2D) (None, 12, 12, 192) 215040 ['activation_42[0][0]']
conv2d_48 (Conv2D) (None, 12, 12, 192) 215040 ['activation_47[0][0]']
conv2d_49 (Conv2D) (None, 12, 12, 192) 147456 ['average_pooling2d_4[0][0]']
batch_normalization_41 (Ba (None, 12, 12, 192) 576 ['conv2d_40[0][0]']
tchNormalization)
batch_normalization_44 (Ba (None, 12, 12, 192) 576 ['conv2d_43[0][0]']
tchNormalization)
batch_normalization_49 (Ba (None, 12, 12, 192) 576 ['conv2d_48[0][0]']
tchNormalization)
batch_normalization_50 (Ba (None, 12, 12, 192) 576 ['conv2d_49[0][0]']
tchNormalization)
activation_40 (Activation) (None, 12, 12, 192) 0 ['batch_normalization_41[0][0]
']
activation_43 (Activation) (None, 12, 12, 192) 0 ['batch_normalization_44[0][0]
']
activation_48 (Activation) (None, 12, 12, 192) 0 ['batch_normalization_49[0][0]
']
activation_49 (Activation) (None, 12, 12, 192) 0 ['batch_normalization_50[0][0]
']
mixed5 (Concatenate) (None, 12, 12, 768) 0 ['activation_40[0][0]',
'activation_43[0][0]',
'activation_48[0][0]',
'activation_49[0][0]']
conv2d_54 (Conv2D) (None, 12, 12, 160) 122880 ['mixed5[0][0]']
batch_normalization_55 (Ba (None, 12, 12, 160) 480 ['conv2d_54[0][0]']
tchNormalization)
activation_54 (Activation) (None, 12, 12, 160) 0 ['batch_normalization_55[0][0]
']
conv2d_55 (Conv2D) (None, 12, 12, 160) 179200 ['activation_54[0][0]']
batch_normalization_56 (Ba (None, 12, 12, 160) 480 ['conv2d_55[0][0]']
tchNormalization)
activation_55 (Activation) (None, 12, 12, 160) 0 ['batch_normalization_56[0][0]
']
conv2d_51 (Conv2D) (None, 12, 12, 160) 122880 ['mixed5[0][0]']
conv2d_56 (Conv2D) (None, 12, 12, 160) 179200 ['activation_55[0][0]']
batch_normalization_52 (Ba (None, 12, 12, 160) 480 ['conv2d_51[0][0]']
tchNormalization)
batch_normalization_57 (Ba (None, 12, 12, 160) 480 ['conv2d_56[0][0]']
tchNormalization)
activation_51 (Activation) (None, 12, 12, 160) 0 ['batch_normalization_52[0][0]
']
activation_56 (Activation) (None, 12, 12, 160) 0 ['batch_normalization_57[0][0]
']
conv2d_52 (Conv2D) (None, 12, 12, 160) 179200 ['activation_51[0][0]']
conv2d_57 (Conv2D) (None, 12, 12, 160) 179200 ['activation_56[0][0]']
batch_normalization_53 (Ba (None, 12, 12, 160) 480 ['conv2d_52[0][0]']
tchNormalization)
batch_normalization_58 (Ba (None, 12, 12, 160) 480 ['conv2d_57[0][0]']
tchNormalization)
activation_52 (Activation) (None, 12, 12, 160) 0 ['batch_normalization_53[0][0]
']
activation_57 (Activation) (None, 12, 12, 160) 0 ['batch_normalization_58[0][0]
']
average_pooling2d_5 (Avera (None, 12, 12, 768) 0 ['mixed5[0][0]']
gePooling2D)
conv2d_50 (Conv2D) (None, 12, 12, 192) 147456 ['mixed5[0][0]']
conv2d_53 (Conv2D) (None, 12, 12, 192) 215040 ['activation_52[0][0]']
conv2d_58 (Conv2D) (None, 12, 12, 192) 215040 ['activation_57[0][0]']
conv2d_59 (Conv2D) (None, 12, 12, 192) 147456 ['average_pooling2d_5[0][0]']
batch_normalization_51 (Ba (None, 12, 12, 192) 576 ['conv2d_50[0][0]']
tchNormalization)
batch_normalization_54 (Ba (None, 12, 12, 192) 576 ['conv2d_53[0][0]']
tchNormalization)
batch_normalization_59 (Ba (None, 12, 12, 192) 576 ['conv2d_58[0][0]']
tchNormalization)
batch_normalization_60 (Ba (None, 12, 12, 192) 576 ['conv2d_59[0][0]']
tchNormalization)
activation_50 (Activation) (None, 12, 12, 192) 0 ['batch_normalization_51[0][0]
']
activation_53 (Activation) (None, 12, 12, 192) 0 ['batch_normalization_54[0][0]
']
activation_58 (Activation) (None, 12, 12, 192) 0 ['batch_normalization_59[0][0]
']
activation_59 (Activation) (None, 12, 12, 192) 0 ['batch_normalization_60[0][0]
']
mixed6 (Concatenate) (None, 12, 12, 768) 0 ['activation_50[0][0]',
'activation_53[0][0]',
'activation_58[0][0]',
'activation_59[0][0]']
conv2d_64 (Conv2D) (None, 12, 12, 192) 147456 ['mixed6[0][0]']
batch_normalization_65 (Ba (None, 12, 12, 192) 576 ['conv2d_64[0][0]']
tchNormalization)
activation_64 (Activation) (None, 12, 12, 192) 0 ['batch_normalization_65[0][0]
']
conv2d_65 (Conv2D) (None, 12, 12, 192) 258048 ['activation_64[0][0]']
batch_normalization_66 (Ba (None, 12, 12, 192) 576 ['conv2d_65[0][0]']
tchNormalization)
activation_65 (Activation) (None, 12, 12, 192) 0 ['batch_normalization_66[0][0]
']
conv2d_61 (Conv2D) (None, 12, 12, 192) 147456 ['mixed6[0][0]']
conv2d_66 (Conv2D) (None, 12, 12, 192) 258048 ['activation_65[0][0]']
batch_normalization_62 (Ba (None, 12, 12, 192) 576 ['conv2d_61[0][0]']
tchNormalization)
batch_normalization_67 (Ba (None, 12, 12, 192) 576 ['conv2d_66[0][0]']
tchNormalization)
activation_61 (Activation) (None, 12, 12, 192) 0 ['batch_normalization_62[0][0]
']
activation_66 (Activation) (None, 12, 12, 192) 0 ['batch_normalization_67[0][0]
']
conv2d_62 (Conv2D) (None, 12, 12, 192) 258048 ['activation_61[0][0]']
conv2d_67 (Conv2D) (None, 12, 12, 192) 258048 ['activation_66[0][0]']
batch_normalization_63 (Ba (None, 12, 12, 192) 576 ['conv2d_62[0][0]']
tchNormalization)
batch_normalization_68 (Ba (None, 12, 12, 192) 576 ['conv2d_67[0][0]']
tchNormalization)
activation_62 (Activation) (None, 12, 12, 192) 0 ['batch_normalization_63[0][0]
']
activation_67 (Activation) (None, 12, 12, 192) 0 ['batch_normalization_68[0][0]
']
average_pooling2d_6 (Avera (None, 12, 12, 768) 0 ['mixed6[0][0]']
gePooling2D)
conv2d_60 (Conv2D) (None, 12, 12, 192) 147456 ['mixed6[0][0]']
conv2d_63 (Conv2D) (None, 12, 12, 192) 258048 ['activation_62[0][0]']
conv2d_68 (Conv2D) (None, 12, 12, 192) 258048 ['activation_67[0][0]']
conv2d_69 (Conv2D) (None, 12, 12, 192) 147456 ['average_pooling2d_6[0][0]']
batch_normalization_61 (Ba (None, 12, 12, 192) 576 ['conv2d_60[0][0]']
tchNormalization)
batch_normalization_64 (Ba (None, 12, 12, 192) 576 ['conv2d_63[0][0]']
tchNormalization)
batch_normalization_69 (Ba (None, 12, 12, 192) 576 ['conv2d_68[0][0]']
tchNormalization)
batch_normalization_70 (Ba (None, 12, 12, 192) 576 ['conv2d_69[0][0]']
tchNormalization)
activation_60 (Activation) (None, 12, 12, 192) 0 ['batch_normalization_61[0][0]
']
activation_63 (Activation) (None, 12, 12, 192) 0 ['batch_normalization_64[0][0]
']
activation_68 (Activation) (None, 12, 12, 192) 0 ['batch_normalization_69[0][0]
']
activation_69 (Activation) (None, 12, 12, 192) 0 ['batch_normalization_70[0][0]
']
mixed7 (Concatenate) (None, 12, 12, 768) 0 ['activation_60[0][0]',
'activation_63[0][0]',
'activation_68[0][0]',
'activation_69[0][0]']
conv2d_72 (Conv2D) (None, 12, 12, 192) 147456 ['mixed7[0][0]']
batch_normalization_73 (Ba (None, 12, 12, 192) 576 ['conv2d_72[0][0]']
tchNormalization)
activation_72 (Activation) (None, 12, 12, 192) 0 ['batch_normalization_73[0][0]
']
conv2d_73 (Conv2D) (None, 12, 12, 192) 258048 ['activation_72[0][0]']
batch_normalization_74 (Ba (None, 12, 12, 192) 576 ['conv2d_73[0][0]']
tchNormalization)
activation_73 (Activation) (None, 12, 12, 192) 0 ['batch_normalization_74[0][0]
']
conv2d_70 (Conv2D) (None, 12, 12, 192) 147456 ['mixed7[0][0]']
conv2d_74 (Conv2D) (None, 12, 12, 192) 258048 ['activation_73[0][0]']
batch_normalization_71 (Ba (None, 12, 12, 192) 576 ['conv2d_70[0][0]']
tchNormalization)
batch_normalization_75 (Ba (None, 12, 12, 192) 576 ['conv2d_74[0][0]']
tchNormalization)
activation_70 (Activation) (None, 12, 12, 192) 0 ['batch_normalization_71[0][0]
']
activation_74 (Activation) (None, 12, 12, 192) 0 ['batch_normalization_75[0][0]
']
conv2d_71 (Conv2D) (None, 5, 5, 320) 552960 ['activation_70[0][0]']
conv2d_75 (Conv2D) (None, 5, 5, 192) 331776 ['activation_74[0][0]']
batch_normalization_72 (Ba (None, 5, 5, 320) 960 ['conv2d_71[0][0]']
tchNormalization)
batch_normalization_76 (Ba (None, 5, 5, 192) 576 ['conv2d_75[0][0]']
tchNormalization)
activation_71 (Activation) (None, 5, 5, 320) 0 ['batch_normalization_72[0][0]
']
activation_75 (Activation) (None, 5, 5, 192) 0 ['batch_normalization_76[0][0]
']
max_pooling2d_3 (MaxPoolin (None, 5, 5, 768) 0 ['mixed7[0][0]']
g2D)
mixed8 (Concatenate) (None, 5, 5, 1280) 0 ['activation_71[0][0]',
'activation_75[0][0]',
'max_pooling2d_3[0][0]']
conv2d_80 (Conv2D) (None, 5, 5, 448) 573440 ['mixed8[0][0]']
batch_normalization_81 (Ba (None, 5, 5, 448) 1344 ['conv2d_80[0][0]']
tchNormalization)
activation_80 (Activation) (None, 5, 5, 448) 0 ['batch_normalization_81[0][0]
']
conv2d_77 (Conv2D) (None, 5, 5, 384) 491520 ['mixed8[0][0]']
conv2d_81 (Conv2D) (None, 5, 5, 384) 1548288 ['activation_80[0][0]']
batch_normalization_78 (Ba (None, 5, 5, 384) 1152 ['conv2d_77[0][0]']
tchNormalization)
batch_normalization_82 (Ba (None, 5, 5, 384) 1152 ['conv2d_81[0][0]']
tchNormalization)
activation_77 (Activation) (None, 5, 5, 384) 0 ['batch_normalization_78[0][0]
']
activation_81 (Activation) (None, 5, 5, 384) 0 ['batch_normalization_82[0][0]
']
conv2d_78 (Conv2D) (None, 5, 5, 384) 442368 ['activation_77[0][0]']
conv2d_79 (Conv2D) (None, 5, 5, 384) 442368 ['activation_77[0][0]']
conv2d_82 (Conv2D) (None, 5, 5, 384) 442368 ['activation_81[0][0]']
conv2d_83 (Conv2D) (None, 5, 5, 384) 442368 ['activation_81[0][0]']
average_pooling2d_7 (Avera (None, 5, 5, 1280) 0 ['mixed8[0][0]']
gePooling2D)
conv2d_76 (Conv2D) (None, 5, 5, 320) 409600 ['mixed8[0][0]']
batch_normalization_79 (Ba (None, 5, 5, 384) 1152 ['conv2d_78[0][0]']
tchNormalization)
batch_normalization_80 (Ba (None, 5, 5, 384) 1152 ['conv2d_79[0][0]']
tchNormalization)
batch_normalization_83 (Ba (None, 5, 5, 384) 1152 ['conv2d_82[0][0]']
tchNormalization)
batch_normalization_84 (Ba (None, 5, 5, 384) 1152 ['conv2d_83[0][0]']
tchNormalization)
conv2d_84 (Conv2D) (None, 5, 5, 192) 245760 ['average_pooling2d_7[0][0]']
batch_normalization_77 (Ba (None, 5, 5, 320) 960 ['conv2d_76[0][0]']
tchNormalization)
activation_78 (Activation) (None, 5, 5, 384) 0 ['batch_normalization_79[0][0]
']
activation_79 (Activation) (None, 5, 5, 384) 0 ['batch_normalization_80[0][0]
']
activation_82 (Activation) (None, 5, 5, 384) 0 ['batch_normalization_83[0][0]
']
activation_83 (Activation) (None, 5, 5, 384) 0 ['batch_normalization_84[0][0]
']
batch_normalization_85 (Ba (None, 5, 5, 192) 576 ['conv2d_84[0][0]']
tchNormalization)
activation_76 (Activation) (None, 5, 5, 320) 0 ['batch_normalization_77[0][0]
']
mixed9_0 (Concatenate) (None, 5, 5, 768) 0 ['activation_78[0][0]',
'activation_79[0][0]']
concatenate (Concatenate) (None, 5, 5, 768) 0 ['activation_82[0][0]',
'activation_83[0][0]']
activation_84 (Activation) (None, 5, 5, 192) 0 ['batch_normalization_85[0][0]
']
mixed9 (Concatenate) (None, 5, 5, 2048) 0 ['activation_76[0][0]',
'mixed9_0[0][0]',
'concatenate[0][0]',
'activation_84[0][0]']
conv2d_89 (Conv2D) (None, 5, 5, 448) 917504 ['mixed9[0][0]']
batch_normalization_90 (Ba (None, 5, 5, 448) 1344 ['conv2d_89[0][0]']
tchNormalization)
activation_89 (Activation) (None, 5, 5, 448) 0 ['batch_normalization_90[0][0]
']
conv2d_86 (Conv2D) (None, 5, 5, 384) 786432 ['mixed9[0][0]']
conv2d_90 (Conv2D) (None, 5, 5, 384) 1548288 ['activation_89[0][0]']
batch_normalization_87 (Ba (None, 5, 5, 384) 1152 ['conv2d_86[0][0]']
tchNormalization)
batch_normalization_91 (Ba (None, 5, 5, 384) 1152 ['conv2d_90[0][0]']
tchNormalization)
activation_86 (Activation) (None, 5, 5, 384) 0 ['batch_normalization_87[0][0]
']
activation_90 (Activation) (None, 5, 5, 384) 0 ['batch_normalization_91[0][0]
']
conv2d_87 (Conv2D) (None, 5, 5, 384) 442368 ['activation_86[0][0]']
conv2d_88 (Conv2D) (None, 5, 5, 384) 442368 ['activation_86[0][0]']
conv2d_91 (Conv2D) (None, 5, 5, 384) 442368 ['activation_90[0][0]']
conv2d_92 (Conv2D) (None, 5, 5, 384) 442368 ['activation_90[0][0]']
average_pooling2d_8 (Avera (None, 5, 5, 2048) 0 ['mixed9[0][0]']
gePooling2D)
conv2d_85 (Conv2D) (None, 5, 5, 320) 655360 ['mixed9[0][0]']
batch_normalization_88 (Ba (None, 5, 5, 384) 1152 ['conv2d_87[0][0]']
tchNormalization)
batch_normalization_89 (Ba (None, 5, 5, 384) 1152 ['conv2d_88[0][0]']
tchNormalization)
batch_normalization_92 (Ba (None, 5, 5, 384) 1152 ['conv2d_91[0][0]']
tchNormalization)
batch_normalization_93 (Ba (None, 5, 5, 384) 1152 ['conv2d_92[0][0]']
tchNormalization)
conv2d_93 (Conv2D) (None, 5, 5, 192) 393216 ['average_pooling2d_8[0][0]']
batch_normalization_86 (Ba (None, 5, 5, 320) 960 ['conv2d_85[0][0]']
tchNormalization)
activation_87 (Activation) (None, 5, 5, 384) 0 ['batch_normalization_88[0][0]
']
activation_88 (Activation) (None, 5, 5, 384) 0 ['batch_normalization_89[0][0]
']
activation_91 (Activation) (None, 5, 5, 384) 0 ['batch_normalization_92[0][0]
']
activation_92 (Activation) (None, 5, 5, 384) 0 ['batch_normalization_93[0][0]
']
batch_normalization_94 (Ba (None, 5, 5, 192) 576 ['conv2d_93[0][0]']
tchNormalization)
activation_85 (Activation) (None, 5, 5, 320) 0 ['batch_normalization_86[0][0]
']
mixed9_1 (Concatenate) (None, 5, 5, 768) 0 ['activation_87[0][0]',
'activation_88[0][0]']
concatenate_1 (Concatenate (None, 5, 5, 768) 0 ['activation_91[0][0]',
) 'activation_92[0][0]']
activation_93 (Activation) (None, 5, 5, 192) 0 ['batch_normalization_94[0][0]
']
mixed10 (Concatenate) (None, 5, 5, 2048) 0 ['activation_85[0][0]',
'mixed9_1[0][0]',
'concatenate_1[0][0]',
'activation_93[0][0]']
global_average_pooling2d_1 (None, 2048) 0 ['mixed10[0][0]']
(GlobalAveragePooling2D)
dense_2 (Dense) (None, 1024) 2098176 ['global_average_pooling2d_1[0
][0]']
dense_3 (Dense) (None, 196) 200900 ['dense_2[0][0]']
==================================================================================================
Total params: 24101860 (91.94 MB)
Trainable params: 2299076 (8.77 MB)
Non-trainable params: 21802784 (83.17 MB)
__________________________________________________________________________________________________
# Convert to numpy arrays
# Ensure all images have the same shape before stacking
# Ensure the model summary is called after defining the model
googlenet_batch_size=16
history_googlenet= googlenet_model.fit(
train_generator, # Uses batches from the generator
steps_per_epoch=len(df_train) // googlenet_batch_size, # Number of batches per epoch
epochs=10,
validation_data=val_generator, # Uses batches from the validation generator
validation_steps=len(df_val) // googlenet_batch_size, # Number of validation batches per epoch
)
Epoch 1/10 407/407 [==============================] - 50s 84ms/step - loss: 4.4295 - accuracy: 0.0612 - val_loss: 3.8474 - val_accuracy: 0.0973 Epoch 2/10 407/407 [==============================] - 26s 64ms/step - loss: 3.4477 - accuracy: 0.1557 - val_loss: 3.6378 - val_accuracy: 0.1414 Epoch 3/10 407/407 [==============================] - 26s 64ms/step - loss: 3.0100 - accuracy: 0.2363 - val_loss: 3.5258 - val_accuracy: 0.1618 Epoch 4/10 407/407 [==============================] - 26s 63ms/step - loss: 2.6691 - accuracy: 0.3088 - val_loss: 3.4320 - val_accuracy: 0.1804 Epoch 5/10 407/407 [==============================] - 26s 63ms/step - loss: 2.3804 - accuracy: 0.3737 - val_loss: 3.3786 - val_accuracy: 0.2021 Epoch 6/10 407/407 [==============================] - 25s 61ms/step - loss: 2.1409 - accuracy: 0.4304 - val_loss: 3.4179 - val_accuracy: 0.2170 Epoch 7/10 407/407 [==============================] - 23s 57ms/step - loss: 1.9342 - accuracy: 0.4738 - val_loss: 3.4171 - val_accuracy: 0.2393 Epoch 8/10 407/407 [==============================] - 22s 53ms/step - loss: 1.7375 - accuracy: 0.5236 - val_loss: 3.5221 - val_accuracy: 0.2350 Epoch 9/10 407/407 [==============================] - 21s 52ms/step - loss: 1.5777 - accuracy: 0.5707 - val_loss: 3.5486 - val_accuracy: 0.2399 Epoch 10/10 407/407 [==============================] - 21s 52ms/step - loss: 1.4264 - accuracy: 0.6093 - val_loss: 3.6738 - val_accuracy: 0.2467
#display model accuracy vs model loss
plot_training_history(history_googlenet)
y_pred, y_true,df_googlenet_classification_report = generate_classification_report_tf_model(
model=googlenet_model,
df_val=df_val,
label_encoder=label_encoder,
preprocess_fn=googlenet_preprocess,
batch_size=32,
report_name="googlenet_classification_report.csv"
)
51/51 [==============================] - 10s 43ms/step
Model Accuracy: 0.2474
Classification Report:
Report saved as: googlenet_classification_report.csv
Model Accuracy: 0.2474
Average Summary Metrics:
precision recall f1-score
macro avg 0.354999 0.242632 0.223415
weighted avg 0.376060 0.247391 0.237403
overall_accuracy 0.247391 NaN NaN
Displaying top 10 of googlenet in confusion matrix
df_support = df_googlenet_classification_report.iloc[:-3] # exclude average rows
top_10_classes = df_support.sort_values("support", ascending=False).head(10).index.tolist()
top_10_indices = [np.where(label_encoder.classes_ == cls)[0][0] for cls in top_10_classes]
googlenet_cm = confusion_matrix(y_true, y_pred)
googlenet_cm_top10 = googlenet_cm[np.ix_(top_10_indices, top_10_indices)]
plt.figure(figsize=(10, 8))
sns.heatmap(googlenet_cm_top10, annot=True, fmt='d',
xticklabels=top_10_classes, yticklabels=top_10_classes,
cmap='Blues')
plt.title("GoogleNet Confusion Matrix (Top 10 Classes)")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.tight_layout()
plt.show()
Googlenet Summary¶
- GoogleNet’s inception modules support multi-scale feature learning, but the model overfitted—with training accuracy ~75% and validation accuracy stagnating at ~25%.
- Despite smooth training loss reduction, validation loss increased, confirming poor generalization.
- Classification metrics showed accuracy ~25%, precision ~41%, but very low recall (~26%) and F1-score.
- Performance was impacted by class imbalance and high bias, making the model unsuitable for final use.
Key Issues Identified:
- Potential Class imbalance.
- High Bias (Poor Performance on Validation Data)
6C. AlexNet
# Define paths
#image_dir = 'Car_Images/Car Images/Test Images' # Adjust based on your directory structure
image_dir = 'car_data/car_data/test'
# Prepare data
images = []
labels = []
for index, row in test_annotations_df.iterrows():
#image_name = row['Image Name']
image_name = row['image_name']
# Load and preprocess the image
image = cv2.imread(image_path)
image = cv2.resize(image, (227, 227)) # Resize to 227x227 pixels (AlexNet input size)
images.append(image)
# Assuming 'Image class' contains the class label
#labels.append(row['Image class'])
labels.append(row['image_class'])
# Convert to numpy arrays
images = np.array(images)
labels = np.array(labels)
# Encode labels
unique_classes = np.unique(labels)
def create_alexnet_model(input_shape, num_classes):
model = Sequential()
# First Convolutional Layer
model.add(Conv2D(96, (11, 11), strides=(4, 4), activation='relu', input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2)))
model.add(BatchNormalization())
# Second Convolutional Layer
model.add(Conv2D(256, (5, 5), padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2)))
model.add(BatchNormalization())
# Third Convolutional Layer
model.add(Conv2D(384, (3, 3), padding='same', activation='relu'))
# Fourth Convolutional Layer
model.add(Conv2D(384, (3, 3), padding='same', activation='relu'))
# Fifth Convolutional Layer
model.add(Conv2D(256, (3, 3), padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=(3, 3), strides=(2, 2)))
model.add(BatchNormalization())
# Flatten the output
model.add(Flatten())
# Fully Connected Layers
model.add(Dense(4096, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(4096, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))
return model
# Create the model
input_shape = (224, 224, 3) # Image dimensions for AlexNet
#num_classes = len(unique_classes)
num_classes = len(df_training['labels'].unique())
model = create_alexnet_model(input_shape, num_classes)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# Data augmentation
datagen = ImageDataGenerator(rotation_range=20, width_shift_range=0.2,
height_shift_range=0.2, shear_range=0.2,
zoom_range=0.2, horizontal_flip=True,
fill_mode='nearest')
epochs=10
#batch_size=32
batch_size=16
train_steps = len(df_train) // batch_size
val_steps = len(df_val) // batch_size
model.summary()
alexnet_history = model.fit(
train_generator,
steps_per_epoch = train_steps,
epochs=epochs,
batch_size=32,
validation_data=val_generator,
validation_steps=val_steps
)
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_94 (Conv2D) (None, 54, 54, 96) 34944
max_pooling2d_4 (MaxPoolin (None, 26, 26, 96) 0
g2D)
batch_normalization_95 (Ba (None, 26, 26, 96) 384
tchNormalization)
conv2d_95 (Conv2D) (None, 26, 26, 256) 614656
max_pooling2d_5 (MaxPoolin (None, 12, 12, 256) 0
g2D)
batch_normalization_96 (Ba (None, 12, 12, 256) 1024
tchNormalization)
conv2d_96 (Conv2D) (None, 12, 12, 384) 885120
conv2d_97 (Conv2D) (None, 12, 12, 384) 1327488
conv2d_98 (Conv2D) (None, 12, 12, 256) 884992
max_pooling2d_6 (MaxPoolin (None, 5, 5, 256) 0
g2D)
batch_normalization_97 (Ba (None, 5, 5, 256) 1024
tchNormalization)
flatten (Flatten) (None, 6400) 0
dense_4 (Dense) (None, 4096) 26218496
dropout_1 (Dropout) (None, 4096) 0
dense_5 (Dense) (None, 4096) 16781312
dropout_2 (Dropout) (None, 4096) 0
dense_6 (Dense) (None, 196) 803012
=================================================================
Total params: 47552452 (181.40 MB)
Trainable params: 47551236 (181.39 MB)
Non-trainable params: 1216 (4.75 KB)
_________________________________________________________________
Epoch 1/10
407/407 [==============================] - 31s 66ms/step - loss: 5.4835 - accuracy: 0.0048 - val_loss: 5.2815 - val_accuracy: 0.0087
Epoch 2/10
407/407 [==============================] - 26s 63ms/step - loss: 5.2781 - accuracy: 0.0080 - val_loss: 5.2859 - val_accuracy: 0.0087
Epoch 3/10
407/407 [==============================] - 25s 62ms/step - loss: 5.2760 - accuracy: 0.0083 - val_loss: 5.2910 - val_accuracy: 0.0087
Epoch 4/10
407/407 [==============================] - 25s 62ms/step - loss: 5.2756 - accuracy: 0.0083 - val_loss: 5.2935 - val_accuracy: 0.0087
Epoch 5/10
407/407 [==============================] - 24s 60ms/step - loss: 5.2753 - accuracy: 0.0083 - val_loss: 5.2986 - val_accuracy: 0.0087
Epoch 6/10
407/407 [==============================] - 23s 57ms/step - loss: 5.2754 - accuracy: 0.0083 - val_loss: 5.2960 - val_accuracy: 0.0087
Epoch 7/10
407/407 [==============================] - 21s 52ms/step - loss: 5.2737 - accuracy: 0.0083 - val_loss: 5.2976 - val_accuracy: 0.0087
Epoch 8/10
407/407 [==============================] - 21s 52ms/step - loss: 5.5139 - accuracy: 0.0066 - val_loss: 14.9525 - val_accuracy: 0.0037
Epoch 9/10
407/407 [==============================] - 21s 52ms/step - loss: 5.3022 - accuracy: 0.0080 - val_loss: 5.2965 - val_accuracy: 0.0087
Epoch 10/10
407/407 [==============================] - 21s 52ms/step - loss: 5.2747 - accuracy: 0.0083 - val_loss: 5.3342 - val_accuracy: 0.0087
#display model accuracy vs loss
plot_training_history(alexnet_history)
X_val = np.array([img for img in df_val['image']])
y_val_true = np.array([np.argmax(label) for label in df_val['label_categorical']])
# Predict in one go
y_val_pred = np.argmax(model.predict(X_val), axis=1)
51/51 [==============================] - 1s 12ms/step
alexnet_report = classification_report(
y_val_true,
y_val_pred,
target_names=label_encoder.classes_,
output_dict=True,
zero_division=1
)
acc = accuracy_score(y_val_true, y_val_pred)
df_alexnet_classification_report = pd.DataFrame(alexnet_report).transpose()
df_alexnet_classification_report.loc["overall_accuracy"] = [acc, None, None, None]
df_alexnet_classification_report.to_csv("alexnet_classification_report_vectorized.csv")
print(f"Accuracy Score: {acc:.4f}")
print("Average Summary Metrics:")
print(df_alexnet_classification_report.tail(3)[["precision", "recall", "f1-score"]])
Accuracy Score: 0.0086
Average Summary Metrics:
precision recall f1-score
macro avg 0.989840 0.005102 0.000087
weighted avg 0.987796 0.008594 0.000147
overall_accuracy 0.008594 NaN NaN
confusion metrics
df_support = df_alexnet_classification_report.iloc[:-3] # exclude average rows
top_10_classes = df_support.sort_values("support", ascending=False).head(10).index.tolist()
top_10_indices = [np.where(label_encoder.classes_ == cls)[0][0] for cls in top_10_classes]
alexnet_cm = confusion_matrix(y_val_true, y_val_pred)
alexnet_cm_top10 = alexnet_cm[np.ix_(top_10_indices, top_10_indices)]
plt.figure(figsize=(10, 8))
sns.heatmap(alexnet_cm_top10, annot=True, fmt='d',
xticklabels=top_10_classes, yticklabels=top_10_classes,
cmap='Blues')
plt.title("AlexNet Confusion Matrix (Top 10 Classes)")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.tight_layout()
plt.show()
Observation For Alexnet Model¶
- The Model is not learning appropriately since the Training Accuracy and Validation accuracy are pretty low
- There is huge value loss in both training as well as validation datasets which indicates that the model is not performing well
- We have the accuracy reported as none in the classification report. which inidicates that the model is not performing well
Further Actions that can be taken are
- Increaase Data Augmentation for the model to perform well, in the training data set.
- Possible class imbalance
- Parameter tuning
6D. ResNet
# Load ResNet50 base model without the top layer
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
# Freeze base model layers
for layer in base_model.layers:
layer.trainable = False
# Add custom classification layers
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(512, activation='relu')(x)
x = Dropout(0.5)(x)
x = Dense(len(df_training['labels_encoded'].unique()), activation='softmax')(x) # Output layer
# Define model
resnet_model = Model(inputs=base_model.input, outputs=x)
# Compile model
resnet_model.compile(optimizer=Adam(learning_rate=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])
# Print model summary
resnet_model.summary()
Model: "model_2"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_3 (InputLayer) [(None, 224, 224, 3)] 0 []
conv1_pad (ZeroPadding2D) (None, 230, 230, 3) 0 ['input_3[0][0]']
conv1_conv (Conv2D) (None, 112, 112, 64) 9472 ['conv1_pad[0][0]']
conv1_bn (BatchNormalizati (None, 112, 112, 64) 256 ['conv1_conv[0][0]']
on)
conv1_relu (Activation) (None, 112, 112, 64) 0 ['conv1_bn[0][0]']
pool1_pad (ZeroPadding2D) (None, 114, 114, 64) 0 ['conv1_relu[0][0]']
pool1_pool (MaxPooling2D) (None, 56, 56, 64) 0 ['pool1_pad[0][0]']
conv2_block1_1_conv (Conv2 (None, 56, 56, 64) 4160 ['pool1_pool[0][0]']
D)
conv2_block1_1_bn (BatchNo (None, 56, 56, 64) 256 ['conv2_block1_1_conv[0][0]']
rmalization)
conv2_block1_1_relu (Activ (None, 56, 56, 64) 0 ['conv2_block1_1_bn[0][0]']
ation)
conv2_block1_2_conv (Conv2 (None, 56, 56, 64) 36928 ['conv2_block1_1_relu[0][0]']
D)
conv2_block1_2_bn (BatchNo (None, 56, 56, 64) 256 ['conv2_block1_2_conv[0][0]']
rmalization)
conv2_block1_2_relu (Activ (None, 56, 56, 64) 0 ['conv2_block1_2_bn[0][0]']
ation)
conv2_block1_0_conv (Conv2 (None, 56, 56, 256) 16640 ['pool1_pool[0][0]']
D)
conv2_block1_3_conv (Conv2 (None, 56, 56, 256) 16640 ['conv2_block1_2_relu[0][0]']
D)
conv2_block1_0_bn (BatchNo (None, 56, 56, 256) 1024 ['conv2_block1_0_conv[0][0]']
rmalization)
conv2_block1_3_bn (BatchNo (None, 56, 56, 256) 1024 ['conv2_block1_3_conv[0][0]']
rmalization)
conv2_block1_add (Add) (None, 56, 56, 256) 0 ['conv2_block1_0_bn[0][0]',
'conv2_block1_3_bn[0][0]']
conv2_block1_out (Activati (None, 56, 56, 256) 0 ['conv2_block1_add[0][0]']
on)
conv2_block2_1_conv (Conv2 (None, 56, 56, 64) 16448 ['conv2_block1_out[0][0]']
D)
conv2_block2_1_bn (BatchNo (None, 56, 56, 64) 256 ['conv2_block2_1_conv[0][0]']
rmalization)
conv2_block2_1_relu (Activ (None, 56, 56, 64) 0 ['conv2_block2_1_bn[0][0]']
ation)
conv2_block2_2_conv (Conv2 (None, 56, 56, 64) 36928 ['conv2_block2_1_relu[0][0]']
D)
conv2_block2_2_bn (BatchNo (None, 56, 56, 64) 256 ['conv2_block2_2_conv[0][0]']
rmalization)
conv2_block2_2_relu (Activ (None, 56, 56, 64) 0 ['conv2_block2_2_bn[0][0]']
ation)
conv2_block2_3_conv (Conv2 (None, 56, 56, 256) 16640 ['conv2_block2_2_relu[0][0]']
D)
conv2_block2_3_bn (BatchNo (None, 56, 56, 256) 1024 ['conv2_block2_3_conv[0][0]']
rmalization)
conv2_block2_add (Add) (None, 56, 56, 256) 0 ['conv2_block1_out[0][0]',
'conv2_block2_3_bn[0][0]']
conv2_block2_out (Activati (None, 56, 56, 256) 0 ['conv2_block2_add[0][0]']
on)
conv2_block3_1_conv (Conv2 (None, 56, 56, 64) 16448 ['conv2_block2_out[0][0]']
D)
conv2_block3_1_bn (BatchNo (None, 56, 56, 64) 256 ['conv2_block3_1_conv[0][0]']
rmalization)
conv2_block3_1_relu (Activ (None, 56, 56, 64) 0 ['conv2_block3_1_bn[0][0]']
ation)
conv2_block3_2_conv (Conv2 (None, 56, 56, 64) 36928 ['conv2_block3_1_relu[0][0]']
D)
conv2_block3_2_bn (BatchNo (None, 56, 56, 64) 256 ['conv2_block3_2_conv[0][0]']
rmalization)
conv2_block3_2_relu (Activ (None, 56, 56, 64) 0 ['conv2_block3_2_bn[0][0]']
ation)
conv2_block3_3_conv (Conv2 (None, 56, 56, 256) 16640 ['conv2_block3_2_relu[0][0]']
D)
conv2_block3_3_bn (BatchNo (None, 56, 56, 256) 1024 ['conv2_block3_3_conv[0][0]']
rmalization)
conv2_block3_add (Add) (None, 56, 56, 256) 0 ['conv2_block2_out[0][0]',
'conv2_block3_3_bn[0][0]']
conv2_block3_out (Activati (None, 56, 56, 256) 0 ['conv2_block3_add[0][0]']
on)
conv3_block1_1_conv (Conv2 (None, 28, 28, 128) 32896 ['conv2_block3_out[0][0]']
D)
conv3_block1_1_bn (BatchNo (None, 28, 28, 128) 512 ['conv3_block1_1_conv[0][0]']
rmalization)
conv3_block1_1_relu (Activ (None, 28, 28, 128) 0 ['conv3_block1_1_bn[0][0]']
ation)
conv3_block1_2_conv (Conv2 (None, 28, 28, 128) 147584 ['conv3_block1_1_relu[0][0]']
D)
conv3_block1_2_bn (BatchNo (None, 28, 28, 128) 512 ['conv3_block1_2_conv[0][0]']
rmalization)
conv3_block1_2_relu (Activ (None, 28, 28, 128) 0 ['conv3_block1_2_bn[0][0]']
ation)
conv3_block1_0_conv (Conv2 (None, 28, 28, 512) 131584 ['conv2_block3_out[0][0]']
D)
conv3_block1_3_conv (Conv2 (None, 28, 28, 512) 66048 ['conv3_block1_2_relu[0][0]']
D)
conv3_block1_0_bn (BatchNo (None, 28, 28, 512) 2048 ['conv3_block1_0_conv[0][0]']
rmalization)
conv3_block1_3_bn (BatchNo (None, 28, 28, 512) 2048 ['conv3_block1_3_conv[0][0]']
rmalization)
conv3_block1_add (Add) (None, 28, 28, 512) 0 ['conv3_block1_0_bn[0][0]',
'conv3_block1_3_bn[0][0]']
conv3_block1_out (Activati (None, 28, 28, 512) 0 ['conv3_block1_add[0][0]']
on)
conv3_block2_1_conv (Conv2 (None, 28, 28, 128) 65664 ['conv3_block1_out[0][0]']
D)
conv3_block2_1_bn (BatchNo (None, 28, 28, 128) 512 ['conv3_block2_1_conv[0][0]']
rmalization)
conv3_block2_1_relu (Activ (None, 28, 28, 128) 0 ['conv3_block2_1_bn[0][0]']
ation)
conv3_block2_2_conv (Conv2 (None, 28, 28, 128) 147584 ['conv3_block2_1_relu[0][0]']
D)
conv3_block2_2_bn (BatchNo (None, 28, 28, 128) 512 ['conv3_block2_2_conv[0][0]']
rmalization)
conv3_block2_2_relu (Activ (None, 28, 28, 128) 0 ['conv3_block2_2_bn[0][0]']
ation)
conv3_block2_3_conv (Conv2 (None, 28, 28, 512) 66048 ['conv3_block2_2_relu[0][0]']
D)
conv3_block2_3_bn (BatchNo (None, 28, 28, 512) 2048 ['conv3_block2_3_conv[0][0]']
rmalization)
conv3_block2_add (Add) (None, 28, 28, 512) 0 ['conv3_block1_out[0][0]',
'conv3_block2_3_bn[0][0]']
conv3_block2_out (Activati (None, 28, 28, 512) 0 ['conv3_block2_add[0][0]']
on)
conv3_block3_1_conv (Conv2 (None, 28, 28, 128) 65664 ['conv3_block2_out[0][0]']
D)
conv3_block3_1_bn (BatchNo (None, 28, 28, 128) 512 ['conv3_block3_1_conv[0][0]']
rmalization)
conv3_block3_1_relu (Activ (None, 28, 28, 128) 0 ['conv3_block3_1_bn[0][0]']
ation)
conv3_block3_2_conv (Conv2 (None, 28, 28, 128) 147584 ['conv3_block3_1_relu[0][0]']
D)
conv3_block3_2_bn (BatchNo (None, 28, 28, 128) 512 ['conv3_block3_2_conv[0][0]']
rmalization)
conv3_block3_2_relu (Activ (None, 28, 28, 128) 0 ['conv3_block3_2_bn[0][0]']
ation)
conv3_block3_3_conv (Conv2 (None, 28, 28, 512) 66048 ['conv3_block3_2_relu[0][0]']
D)
conv3_block3_3_bn (BatchNo (None, 28, 28, 512) 2048 ['conv3_block3_3_conv[0][0]']
rmalization)
conv3_block3_add (Add) (None, 28, 28, 512) 0 ['conv3_block2_out[0][0]',
'conv3_block3_3_bn[0][0]']
conv3_block3_out (Activati (None, 28, 28, 512) 0 ['conv3_block3_add[0][0]']
on)
conv3_block4_1_conv (Conv2 (None, 28, 28, 128) 65664 ['conv3_block3_out[0][0]']
D)
conv3_block4_1_bn (BatchNo (None, 28, 28, 128) 512 ['conv3_block4_1_conv[0][0]']
rmalization)
conv3_block4_1_relu (Activ (None, 28, 28, 128) 0 ['conv3_block4_1_bn[0][0]']
ation)
conv3_block4_2_conv (Conv2 (None, 28, 28, 128) 147584 ['conv3_block4_1_relu[0][0]']
D)
conv3_block4_2_bn (BatchNo (None, 28, 28, 128) 512 ['conv3_block4_2_conv[0][0]']
rmalization)
conv3_block4_2_relu (Activ (None, 28, 28, 128) 0 ['conv3_block4_2_bn[0][0]']
ation)
conv3_block4_3_conv (Conv2 (None, 28, 28, 512) 66048 ['conv3_block4_2_relu[0][0]']
D)
conv3_block4_3_bn (BatchNo (None, 28, 28, 512) 2048 ['conv3_block4_3_conv[0][0]']
rmalization)
conv3_block4_add (Add) (None, 28, 28, 512) 0 ['conv3_block3_out[0][0]',
'conv3_block4_3_bn[0][0]']
conv3_block4_out (Activati (None, 28, 28, 512) 0 ['conv3_block4_add[0][0]']
on)
conv4_block1_1_conv (Conv2 (None, 14, 14, 256) 131328 ['conv3_block4_out[0][0]']
D)
conv4_block1_1_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block1_1_conv[0][0]']
rmalization)
conv4_block1_1_relu (Activ (None, 14, 14, 256) 0 ['conv4_block1_1_bn[0][0]']
ation)
conv4_block1_2_conv (Conv2 (None, 14, 14, 256) 590080 ['conv4_block1_1_relu[0][0]']
D)
conv4_block1_2_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block1_2_conv[0][0]']
rmalization)
conv4_block1_2_relu (Activ (None, 14, 14, 256) 0 ['conv4_block1_2_bn[0][0]']
ation)
conv4_block1_0_conv (Conv2 (None, 14, 14, 1024) 525312 ['conv3_block4_out[0][0]']
D)
conv4_block1_3_conv (Conv2 (None, 14, 14, 1024) 263168 ['conv4_block1_2_relu[0][0]']
D)
conv4_block1_0_bn (BatchNo (None, 14, 14, 1024) 4096 ['conv4_block1_0_conv[0][0]']
rmalization)
conv4_block1_3_bn (BatchNo (None, 14, 14, 1024) 4096 ['conv4_block1_3_conv[0][0]']
rmalization)
conv4_block1_add (Add) (None, 14, 14, 1024) 0 ['conv4_block1_0_bn[0][0]',
'conv4_block1_3_bn[0][0]']
conv4_block1_out (Activati (None, 14, 14, 1024) 0 ['conv4_block1_add[0][0]']
on)
conv4_block2_1_conv (Conv2 (None, 14, 14, 256) 262400 ['conv4_block1_out[0][0]']
D)
conv4_block2_1_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block2_1_conv[0][0]']
rmalization)
conv4_block2_1_relu (Activ (None, 14, 14, 256) 0 ['conv4_block2_1_bn[0][0]']
ation)
conv4_block2_2_conv (Conv2 (None, 14, 14, 256) 590080 ['conv4_block2_1_relu[0][0]']
D)
conv4_block2_2_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block2_2_conv[0][0]']
rmalization)
conv4_block2_2_relu (Activ (None, 14, 14, 256) 0 ['conv4_block2_2_bn[0][0]']
ation)
conv4_block2_3_conv (Conv2 (None, 14, 14, 1024) 263168 ['conv4_block2_2_relu[0][0]']
D)
conv4_block2_3_bn (BatchNo (None, 14, 14, 1024) 4096 ['conv4_block2_3_conv[0][0]']
rmalization)
conv4_block2_add (Add) (None, 14, 14, 1024) 0 ['conv4_block1_out[0][0]',
'conv4_block2_3_bn[0][0]']
conv4_block2_out (Activati (None, 14, 14, 1024) 0 ['conv4_block2_add[0][0]']
on)
conv4_block3_1_conv (Conv2 (None, 14, 14, 256) 262400 ['conv4_block2_out[0][0]']
D)
conv4_block3_1_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block3_1_conv[0][0]']
rmalization)
conv4_block3_1_relu (Activ (None, 14, 14, 256) 0 ['conv4_block3_1_bn[0][0]']
ation)
conv4_block3_2_conv (Conv2 (None, 14, 14, 256) 590080 ['conv4_block3_1_relu[0][0]']
D)
conv4_block3_2_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block3_2_conv[0][0]']
rmalization)
conv4_block3_2_relu (Activ (None, 14, 14, 256) 0 ['conv4_block3_2_bn[0][0]']
ation)
conv4_block3_3_conv (Conv2 (None, 14, 14, 1024) 263168 ['conv4_block3_2_relu[0][0]']
D)
conv4_block3_3_bn (BatchNo (None, 14, 14, 1024) 4096 ['conv4_block3_3_conv[0][0]']
rmalization)
conv4_block3_add (Add) (None, 14, 14, 1024) 0 ['conv4_block2_out[0][0]',
'conv4_block3_3_bn[0][0]']
conv4_block3_out (Activati (None, 14, 14, 1024) 0 ['conv4_block3_add[0][0]']
on)
conv4_block4_1_conv (Conv2 (None, 14, 14, 256) 262400 ['conv4_block3_out[0][0]']
D)
conv4_block4_1_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block4_1_conv[0][0]']
rmalization)
conv4_block4_1_relu (Activ (None, 14, 14, 256) 0 ['conv4_block4_1_bn[0][0]']
ation)
conv4_block4_2_conv (Conv2 (None, 14, 14, 256) 590080 ['conv4_block4_1_relu[0][0]']
D)
conv4_block4_2_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block4_2_conv[0][0]']
rmalization)
conv4_block4_2_relu (Activ (None, 14, 14, 256) 0 ['conv4_block4_2_bn[0][0]']
ation)
conv4_block4_3_conv (Conv2 (None, 14, 14, 1024) 263168 ['conv4_block4_2_relu[0][0]']
D)
conv4_block4_3_bn (BatchNo (None, 14, 14, 1024) 4096 ['conv4_block4_3_conv[0][0]']
rmalization)
conv4_block4_add (Add) (None, 14, 14, 1024) 0 ['conv4_block3_out[0][0]',
'conv4_block4_3_bn[0][0]']
conv4_block4_out (Activati (None, 14, 14, 1024) 0 ['conv4_block4_add[0][0]']
on)
conv4_block5_1_conv (Conv2 (None, 14, 14, 256) 262400 ['conv4_block4_out[0][0]']
D)
conv4_block5_1_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block5_1_conv[0][0]']
rmalization)
conv4_block5_1_relu (Activ (None, 14, 14, 256) 0 ['conv4_block5_1_bn[0][0]']
ation)
conv4_block5_2_conv (Conv2 (None, 14, 14, 256) 590080 ['conv4_block5_1_relu[0][0]']
D)
conv4_block5_2_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block5_2_conv[0][0]']
rmalization)
conv4_block5_2_relu (Activ (None, 14, 14, 256) 0 ['conv4_block5_2_bn[0][0]']
ation)
conv4_block5_3_conv (Conv2 (None, 14, 14, 1024) 263168 ['conv4_block5_2_relu[0][0]']
D)
conv4_block5_3_bn (BatchNo (None, 14, 14, 1024) 4096 ['conv4_block5_3_conv[0][0]']
rmalization)
conv4_block5_add (Add) (None, 14, 14, 1024) 0 ['conv4_block4_out[0][0]',
'conv4_block5_3_bn[0][0]']
conv4_block5_out (Activati (None, 14, 14, 1024) 0 ['conv4_block5_add[0][0]']
on)
conv4_block6_1_conv (Conv2 (None, 14, 14, 256) 262400 ['conv4_block5_out[0][0]']
D)
conv4_block6_1_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block6_1_conv[0][0]']
rmalization)
conv4_block6_1_relu (Activ (None, 14, 14, 256) 0 ['conv4_block6_1_bn[0][0]']
ation)
conv4_block6_2_conv (Conv2 (None, 14, 14, 256) 590080 ['conv4_block6_1_relu[0][0]']
D)
conv4_block6_2_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block6_2_conv[0][0]']
rmalization)
conv4_block6_2_relu (Activ (None, 14, 14, 256) 0 ['conv4_block6_2_bn[0][0]']
ation)
conv4_block6_3_conv (Conv2 (None, 14, 14, 1024) 263168 ['conv4_block6_2_relu[0][0]']
D)
conv4_block6_3_bn (BatchNo (None, 14, 14, 1024) 4096 ['conv4_block6_3_conv[0][0]']
rmalization)
conv4_block6_add (Add) (None, 14, 14, 1024) 0 ['conv4_block5_out[0][0]',
'conv4_block6_3_bn[0][0]']
conv4_block6_out (Activati (None, 14, 14, 1024) 0 ['conv4_block6_add[0][0]']
on)
conv5_block1_1_conv (Conv2 (None, 7, 7, 512) 524800 ['conv4_block6_out[0][0]']
D)
conv5_block1_1_bn (BatchNo (None, 7, 7, 512) 2048 ['conv5_block1_1_conv[0][0]']
rmalization)
conv5_block1_1_relu (Activ (None, 7, 7, 512) 0 ['conv5_block1_1_bn[0][0]']
ation)
conv5_block1_2_conv (Conv2 (None, 7, 7, 512) 2359808 ['conv5_block1_1_relu[0][0]']
D)
conv5_block1_2_bn (BatchNo (None, 7, 7, 512) 2048 ['conv5_block1_2_conv[0][0]']
rmalization)
conv5_block1_2_relu (Activ (None, 7, 7, 512) 0 ['conv5_block1_2_bn[0][0]']
ation)
conv5_block1_0_conv (Conv2 (None, 7, 7, 2048) 2099200 ['conv4_block6_out[0][0]']
D)
conv5_block1_3_conv (Conv2 (None, 7, 7, 2048) 1050624 ['conv5_block1_2_relu[0][0]']
D)
conv5_block1_0_bn (BatchNo (None, 7, 7, 2048) 8192 ['conv5_block1_0_conv[0][0]']
rmalization)
conv5_block1_3_bn (BatchNo (None, 7, 7, 2048) 8192 ['conv5_block1_3_conv[0][0]']
rmalization)
conv5_block1_add (Add) (None, 7, 7, 2048) 0 ['conv5_block1_0_bn[0][0]',
'conv5_block1_3_bn[0][0]']
conv5_block1_out (Activati (None, 7, 7, 2048) 0 ['conv5_block1_add[0][0]']
on)
conv5_block2_1_conv (Conv2 (None, 7, 7, 512) 1049088 ['conv5_block1_out[0][0]']
D)
conv5_block2_1_bn (BatchNo (None, 7, 7, 512) 2048 ['conv5_block2_1_conv[0][0]']
rmalization)
conv5_block2_1_relu (Activ (None, 7, 7, 512) 0 ['conv5_block2_1_bn[0][0]']
ation)
conv5_block2_2_conv (Conv2 (None, 7, 7, 512) 2359808 ['conv5_block2_1_relu[0][0]']
D)
conv5_block2_2_bn (BatchNo (None, 7, 7, 512) 2048 ['conv5_block2_2_conv[0][0]']
rmalization)
conv5_block2_2_relu (Activ (None, 7, 7, 512) 0 ['conv5_block2_2_bn[0][0]']
ation)
conv5_block2_3_conv (Conv2 (None, 7, 7, 2048) 1050624 ['conv5_block2_2_relu[0][0]']
D)
conv5_block2_3_bn (BatchNo (None, 7, 7, 2048) 8192 ['conv5_block2_3_conv[0][0]']
rmalization)
conv5_block2_add (Add) (None, 7, 7, 2048) 0 ['conv5_block1_out[0][0]',
'conv5_block2_3_bn[0][0]']
conv5_block2_out (Activati (None, 7, 7, 2048) 0 ['conv5_block2_add[0][0]']
on)
conv5_block3_1_conv (Conv2 (None, 7, 7, 512) 1049088 ['conv5_block2_out[0][0]']
D)
conv5_block3_1_bn (BatchNo (None, 7, 7, 512) 2048 ['conv5_block3_1_conv[0][0]']
rmalization)
conv5_block3_1_relu (Activ (None, 7, 7, 512) 0 ['conv5_block3_1_bn[0][0]']
ation)
conv5_block3_2_conv (Conv2 (None, 7, 7, 512) 2359808 ['conv5_block3_1_relu[0][0]']
D)
conv5_block3_2_bn (BatchNo (None, 7, 7, 512) 2048 ['conv5_block3_2_conv[0][0]']
rmalization)
conv5_block3_2_relu (Activ (None, 7, 7, 512) 0 ['conv5_block3_2_bn[0][0]']
ation)
conv5_block3_3_conv (Conv2 (None, 7, 7, 2048) 1050624 ['conv5_block3_2_relu[0][0]']
D)
conv5_block3_3_bn (BatchNo (None, 7, 7, 2048) 8192 ['conv5_block3_3_conv[0][0]']
rmalization)
conv5_block3_add (Add) (None, 7, 7, 2048) 0 ['conv5_block2_out[0][0]',
'conv5_block3_3_bn[0][0]']
conv5_block3_out (Activati (None, 7, 7, 2048) 0 ['conv5_block3_add[0][0]']
on)
global_average_pooling2d_2 (None, 2048) 0 ['conv5_block3_out[0][0]']
(GlobalAveragePooling2D)
dense_7 (Dense) (None, 512) 1049088 ['global_average_pooling2d_2[0
][0]']
dropout_3 (Dropout) (None, 512) 0 ['dense_7[0][0]']
dense_8 (Dense) (None, 196) 100548 ['dropout_3[0][0]']
==================================================================================================
Total params: 24737348 (94.37 MB)
Trainable params: 1149636 (4.39 MB)
Non-trainable params: 23587712 (89.98 MB)
__________________________________________________________________________________________________
epochs = 10
batch_size=16
steps_per_epoch = len(df_train) // batch_size
validation_steps = len(df_val) // batch_size
resnet_history = resnet_model.fit(
train_generator,
steps_per_epoch=steps_per_epoch,
validation_data=val_generator,
validation_steps=validation_steps,
epochs=epochs
)
Epoch 1/10 407/407 [==============================] - 39s 75ms/step - loss: 5.3967 - accuracy: 0.0046 - val_loss: 5.2907 - val_accuracy: 0.0056 Epoch 2/10 407/407 [==============================] - 26s 64ms/step - loss: 5.2854 - accuracy: 0.0055 - val_loss: 5.2827 - val_accuracy: 0.0050 Epoch 3/10 407/407 [==============================] - 26s 63ms/step - loss: 5.2776 - accuracy: 0.0066 - val_loss: 5.2804 - val_accuracy: 0.0043 Epoch 4/10 407/407 [==============================] - 25s 62ms/step - loss: 5.2732 - accuracy: 0.0077 - val_loss: 5.2814 - val_accuracy: 0.0068 Epoch 5/10 407/407 [==============================] - 24s 60ms/step - loss: 5.2693 - accuracy: 0.0071 - val_loss: 5.2802 - val_accuracy: 0.0068 Epoch 6/10 407/407 [==============================] - 23s 57ms/step - loss: 5.2646 - accuracy: 0.0098 - val_loss: 5.2784 - val_accuracy: 0.0074 Epoch 7/10 407/407 [==============================] - 21s 52ms/step - loss: 5.2634 - accuracy: 0.0088 - val_loss: 5.2754 - val_accuracy: 0.0081 Epoch 8/10 407/407 [==============================] - 21s 52ms/step - loss: 5.2569 - accuracy: 0.0094 - val_loss: 5.2747 - val_accuracy: 0.0074 Epoch 9/10 407/407 [==============================] - 21s 52ms/step - loss: 5.2529 - accuracy: 0.0089 - val_loss: 5.2689 - val_accuracy: 0.0081 Epoch 10/10 407/407 [==============================] - 21s 52ms/step - loss: 5.2463 - accuracy: 0.0105 - val_loss: 5.2723 - val_accuracy: 0.0074
#accuracy loss graph
plot_training_history(resnet_history)
y_pred, y_true,df_resnet_classification_report = generate_classification_report_tf_model(
model=resnet_model,
df_val=df_val,
label_encoder=label_encoder,
preprocess_fn=resnet_preprocess,
batch_size=32,
report_name="resnet_classification_report.csv"
)
51/51 [==============================] - 7s 43ms/step
Model Accuracy: 0.0074
Classification Report:
Report saved as: resnet_classification_report.csv
Model Accuracy: 0.0074
Average Summary Metrics:
precision recall f1-score
macro avg 0.964518 0.007240 0.000442
weighted avg 0.968385 0.007366 0.000582
overall_accuracy 0.007366 NaN NaN
# Compute confusion matrix
df_support = df_resnet_classification_report.iloc[:-3] # exclude average rows
top_10_classes = df_support.sort_values("support", ascending=False).head(10).index.tolist()
top_10_indices = [np.where(label_encoder.classes_ == cls)[0][0] for cls in top_10_classes]
resnet_cm = confusion_matrix(y_val_true, y_val_pred)
resnet_cm_top10 = resnet_cm[np.ix_(top_10_indices, top_10_indices)]
# Plot confusion matrix
plt.figure(figsize=(10, 8))
sns.heatmap(resnet_cm_top10, annot=True, fmt='d',
xticklabels=top_10_classes, yticklabels=top_10_classes,
cmap='Blues')
#sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=label_encoder.classes_, yticklabels=label_encoder.classes_)
plt.title("ResNet Confusion Matrix (Top 10 Classes)")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.show()
Observation:
The model is not doing well due to
- exteremely low accuracy
- High Precision and Very low Recall
Further Actions could be
- Check for Data Imbalance
- Fine tune parameters and retrain the model
7. Intermediate Summary - Further Steps¶
Googlenet and Resnet further in next milestone will undergo hyper parameter tunning as commonly data imbalance and accuracy is less compared to loss
Mobilenet and Alexnet are light weight models/Shallow models, hence they are being dropped from further fine tuning and comparing them with other models.
8. Fine Tuning¶
Fine Tuning Of GoogleNet Model¶
from tensorflow.keras.mixed_precision import set_global_policy
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint,ReduceLROnPlateau
import random
set_global_policy('mixed_float16')
INFO:tensorflow:Mixed precision compatibility check (mixed_float16): OK Your GPU will likely run quickly with dtype policy mixed_float16 as it has compute capability of at least 7.0. Your GPU: NVIDIA A10G, compute capability 8.6
print("TF Version:", tf.__version__)
print("GPU Available:", tf.config.list_physical_devices('GPU'))
TF Version: 2.16.2 GPU Available: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
unique_classes = df_training['labels'].unique()
base_model = InceptionV3(
weights='imagenet',
include_top=False,
input_shape=(224, 224, 3)
)
for layer in base_model.layers[:-50]:
layer.trainable = False
x = base_model.output
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(512, activation='relu')(x)
x = tf.keras.layers.Dropout(0.5)(x)
output = layers.Dense(196, activation='softmax', dtype='float32')(x) # Force output to float32
googlenet_model_tuned = Model(inputs=base_model.input, outputs=output)
googlenet_model_tuned.compile(
optimizer=Adam(learning_rate=1e-3),
loss='sparse_categorical_crossentropy', # use categorical_crossentropy if labels are one-hot
metrics=['accuracy', tf.keras.metrics.TopKCategoricalAccuracy(k=5)]
)
batch_size = 16
use_augmentation = True
df_split = df_training.drop(columns=['image']).copy()
df_train_googlenet, df_val_googlenet = train_test_split( df_split, test_size=0.2, random_state=42)
train_paths = df_train_googlenet["Image_Path"].values
train_labels = np.array([np.argmax(label) for label in df_train_googlenet["label_categorical"]])
val_paths = df_val_googlenet["Image_Path"].values
val_labels = np.array([np.argmax(label) for label in df_val_googlenet["label_categorical"]])
data_augmentation = tf.keras.Sequential([
layers.Rescaling(1./255),
layers.RandomFlip("horizontal"),
layers.RandomRotation(0.1),
layers.RandomZoom(0.1),
layers.RandomContrast(0.1),
layers.RandomTranslation(0.1, 0.1)
])
def load_and_preprocess(path, label):
image = tf.io.read_file(path)
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.resize(image, [224, 224])
image = tf.cast(image, tf.float32) / 255.0
return image, label
def load_preprocess_with_augment(path, label):
image, label = load_and_preprocess(path, label)
image = data_augmentation(image)
return image, label
# Training dataset
train_ds = tf.data.Dataset.from_tensor_slices((train_paths, train_labels)).shuffle(1000)
# Apply map function based on flag
if use_augmentation:
train_ds = train_ds.map(load_preprocess_with_augment, num_parallel_calls=tf.data.AUTOTUNE)
else:
train_ds = train_ds.map(load_and_preprocess, num_parallel_calls=tf.data.AUTOTUNE)
train_ds = train_ds.batch(batch_size).prefetch(tf.data.AUTOTUNE)
val_ds = tf.data.Dataset.from_tensor_slices((val_paths, val_labels)) \
.map(load_and_preprocess, num_parallel_calls=tf.data.AUTOTUNE) \
.batch(batch_size) \
.prefetch(tf.data.AUTOTUNE)
callbacks = [
EarlyStopping(
monitor='val_loss',
patience=30,
restore_best_weights=True,
verbose=1
),
ModelCheckpoint(
filepath='googlenet_finetuned_best.keras',
monitor='val_loss',
save_best_only=True,
verbose=1
),
ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=5, min_lr=1e-7, verbose=1)
]
train_class_indices = np.array([np.argmax(label) for label in df_train_googlenet["label_categorical"]]) #Get Class Indicies
# Compute class weights
class_weights_array = compute_class_weight(
class_weight='balanced',
classes=np.unique(train_class_indices),
y=train_class_indices
)
class_weights = dict(enumerate(class_weights_array)) #converting to dict
history = googlenet_model_tuned.fit(
train_ds,
validation_data=val_ds,
epochs=20
,callbacks=callbacks
,class_weight=class_weights #class imbalance
)
Epoch 1/20 WARNING:tensorflow:AutoGraph could not transform <function create_autocast_variable at 0x7ff42863a950> and will run it as-is. Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: <gast.gast.Expr object at 0x7ff2ddb90820> To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert WARNING: AutoGraph could not transform <function create_autocast_variable at 0x7ff42863a950> and will run it as-is. Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: <gast.gast.Expr object at 0x7ff2ddb90820> To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert 408/408 [==============================] - ETA: 0s - loss: 5.3014 - accuracy: 0.0025 - top_k_categorical_accuracy: 0.0180 Epoch 1: val_loss improved from inf to 5.28157, saving model to googlenet_finetuned_best.keras 408/408 [==============================] - 45s 58ms/step - loss: 5.3014 - accuracy: 0.0025 - top_k_categorical_accuracy: 0.0180 - val_loss: 5.2816 - val_accuracy: 0.0037 - val_top_k_categorical_accuracy: 0.0589 - lr: 0.0010 Epoch 2/20 405/408 [============================>.] - ETA: 0s - loss: 5.2784 - accuracy: 0.0025 - top_k_categorical_accuracy: 0.0000e+00 Epoch 2: val_loss did not improve from 5.28157 408/408 [==============================] - 13s 32ms/step - loss: 5.2798 - accuracy: 0.0025 - top_k_categorical_accuracy: 0.0000e+00 - val_loss: 5.2821 - val_accuracy: 0.0043 - val_top_k_categorical_accuracy: 0.0552 - lr: 0.0010 Epoch 3/20 405/408 [============================>.] - ETA: 0s - loss: 5.2782 - accuracy: 0.0052 - top_k_categorical_accuracy: 0.0000e+00 Epoch 3: val_loss did not improve from 5.28157 408/408 [==============================] - 14s 34ms/step - loss: 5.2798 - accuracy: 0.0052 - top_k_categorical_accuracy: 0.0000e+00 - val_loss: 5.2821 - val_accuracy: 0.0043 - val_top_k_categorical_accuracy: 0.0473 - lr: 0.0010 Epoch 4/20 405/408 [============================>.] - ETA: 0s - loss: 5.2792 - accuracy: 0.0049 - top_k_categorical_accuracy: 0.0000e+00 Epoch 4: val_loss did not improve from 5.28157 408/408 [==============================] - 13s 33ms/step - loss: 5.2798 - accuracy: 0.0049 - top_k_categorical_accuracy: 0.0000e+00 - val_loss: 5.2822 - val_accuracy: 0.0055 - val_top_k_categorical_accuracy: 0.0503 - lr: 0.0010 Epoch 5/20 407/408 [============================>.] - ETA: 0s - loss: 5.2799 - accuracy: 0.0029 - top_k_categorical_accuracy: 0.0000e+00 Epoch 5: val_loss did not improve from 5.28157 408/408 [==============================] - 13s 33ms/step - loss: 5.2798 - accuracy: 0.0029 - top_k_categorical_accuracy: 0.0000e+00 - val_loss: 5.2820 - val_accuracy: 0.0049 - val_top_k_categorical_accuracy: 0.0571 - lr: 0.0010 Epoch 6/20 406/408 [============================>.] - ETA: 0s - loss: 5.2799 - accuracy: 0.0031 - top_k_categorical_accuracy: 0.0054 Epoch 6: val_loss improved from 5.28157 to 5.28134, saving model to googlenet_finetuned_best.keras 408/408 [==============================] - 15s 36ms/step - loss: 5.2800 - accuracy: 0.0031 - top_k_categorical_accuracy: 0.0054 - val_loss: 5.2813 - val_accuracy: 0.0031 - val_top_k_categorical_accuracy: 0.0061 - lr: 0.0010 Epoch 7/20 405/408 [============================>.] - ETA: 0s - loss: 5.2803 - accuracy: 0.0042 - top_k_categorical_accuracy: 0.0000e+00 Epoch 7: val_loss did not improve from 5.28134 408/408 [==============================] - 14s 33ms/step - loss: 5.2798 - accuracy: 0.0041 - top_k_categorical_accuracy: 0.0000e+00 - val_loss: 5.2821 - val_accuracy: 0.0018 - val_top_k_categorical_accuracy: 0.0098 - lr: 0.0010 Epoch 8/20 406/408 [============================>.] - ETA: 0s - loss: 5.2791 - accuracy: 0.0020 - top_k_categorical_accuracy: 0.2512 Epoch 8: val_loss did not improve from 5.28134 408/408 [==============================] - 13s 32ms/step - loss: 5.2798 - accuracy: 0.0020 - top_k_categorical_accuracy: 0.2534 - val_loss: 5.2820 - val_accuracy: 0.0018 - val_top_k_categorical_accuracy: 0.0276 - lr: 0.0010 Epoch 9/20 406/408 [============================>.] - ETA: 0s - loss: 5.2799 - accuracy: 0.0029 - top_k_categorical_accuracy: 0.0099 Epoch 9: val_loss did not improve from 5.28134 408/408 [==============================] - 13s 32ms/step - loss: 5.2798 - accuracy: 0.0029 - top_k_categorical_accuracy: 0.0098 - val_loss: 5.2818 - val_accuracy: 0.0018 - val_top_k_categorical_accuracy: 0.0258 - lr: 0.0010 Epoch 10/20 408/408 [==============================] - ETA: 0s - loss: 5.2798 - accuracy: 0.0017 - top_k_categorical_accuracy: 0.0295 Epoch 10: val_loss did not improve from 5.28134 408/408 [==============================] - 13s 32ms/step - loss: 5.2798 - accuracy: 0.0017 - top_k_categorical_accuracy: 0.0295 - val_loss: 5.2819 - val_accuracy: 0.0018 - val_top_k_categorical_accuracy: 0.0221 - lr: 0.0010 Epoch 11/20 405/408 [============================>.] - ETA: 0s - loss: 5.2776 - accuracy: 0.0037 - top_k_categorical_accuracy: 0.0025 Epoch 11: val_loss did not improve from 5.28134 Epoch 11: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257. 408/408 [==============================] - 13s 33ms/step - loss: 5.2798 - accuracy: 0.0037 - top_k_categorical_accuracy: 0.0025 - val_loss: 5.2819 - val_accuracy: 0.0018 - val_top_k_categorical_accuracy: 0.0233 - lr: 0.0010 Epoch 12/20 406/408 [============================>.] - ETA: 0s - loss: 5.2790 - accuracy: 0.0026 - top_k_categorical_accuracy: 0.0443 Epoch 12: val_loss did not improve from 5.28134 408/408 [==============================] - 13s 33ms/step - loss: 5.2789 - accuracy: 0.0028 - top_k_categorical_accuracy: 0.0442 - val_loss: 5.2819 - val_accuracy: 0.0018 - val_top_k_categorical_accuracy: 0.0233 - lr: 5.0000e-04 Epoch 13/20 405/408 [============================>.] - ETA: 0s - loss: 5.2787 - accuracy: 0.0023 - top_k_categorical_accuracy: 0.0420 Epoch 13: val_loss did not improve from 5.28134 408/408 [==============================] - 13s 32ms/step - loss: 5.2790 - accuracy: 0.0023 - top_k_categorical_accuracy: 0.0417 - val_loss: 5.2818 - val_accuracy: 0.0018 - val_top_k_categorical_accuracy: 0.0233 - lr: 5.0000e-04 Epoch 14/20 405/408 [============================>.] - ETA: 0s - loss: 5.2792 - accuracy: 0.0035 - top_k_categorical_accuracy: 0.0272 Epoch 14: val_loss did not improve from 5.28134 408/408 [==============================] - 13s 32ms/step - loss: 5.2789 - accuracy: 0.0035 - top_k_categorical_accuracy: 0.0270 - val_loss: 5.2818 - val_accuracy: 0.0018 - val_top_k_categorical_accuracy: 0.0203 - lr: 5.0000e-04 Epoch 15/20 408/408 [==============================] - ETA: 0s - loss: 5.2790 - accuracy: 0.0037 - top_k_categorical_accuracy: 0.2579 Epoch 15: val_loss did not improve from 5.28134 408/408 [==============================] - 13s 33ms/step - loss: 5.2790 - accuracy: 0.0037 - top_k_categorical_accuracy: 0.2579 - val_loss: 5.2819 - val_accuracy: 0.0018 - val_top_k_categorical_accuracy: 0.0184 - lr: 5.0000e-04 Epoch 16/20 405/408 [============================>.] - ETA: 0s - loss: 5.2791 - accuracy: 0.0045 - top_k_categorical_accuracy: 0.1284 Epoch 16: val_loss did not improve from 5.28134 Epoch 16: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628. 408/408 [==============================] - 13s 33ms/step - loss: 5.2789 - accuracy: 0.0045 - top_k_categorical_accuracy: 0.1277 - val_loss: 5.2818 - val_accuracy: 0.0018 - val_top_k_categorical_accuracy: 0.0203 - lr: 5.0000e-04 Epoch 17/20 408/408 [==============================] - ETA: 0s - loss: 5.2785 - accuracy: 0.0048 - top_k_categorical_accuracy: 0.0000e+00 Epoch 17: val_loss did not improve from 5.28134 408/408 [==============================] - 14s 33ms/step - loss: 5.2785 - accuracy: 0.0048 - top_k_categorical_accuracy: 0.0000e+00 - val_loss: 5.2818 - val_accuracy: 0.0018 - val_top_k_categorical_accuracy: 0.0203 - lr: 2.5000e-04 Epoch 18/20 406/408 [============================>.] - ETA: 0s - loss: 5.2789 - accuracy: 0.0042 - top_k_categorical_accuracy: 0.0000e+00 Epoch 18: val_loss did not improve from 5.28134 408/408 [==============================] - 13s 32ms/step - loss: 5.2785 - accuracy: 0.0043 - top_k_categorical_accuracy: 0.0000e+00 - val_loss: 5.2818 - val_accuracy: 0.0018 - val_top_k_categorical_accuracy: 0.0196 - lr: 2.5000e-04 Epoch 19/20 405/408 [============================>.] - ETA: 0s - loss: 5.2789 - accuracy: 0.0042 - top_k_categorical_accuracy: 0.0000e+00 Epoch 19: val_loss did not improve from 5.28134 408/408 [==============================] - 13s 32ms/step - loss: 5.2785 - accuracy: 0.0043 - top_k_categorical_accuracy: 0.0000e+00 - val_loss: 5.2818 - val_accuracy: 0.0018 - val_top_k_categorical_accuracy: 0.0203 - lr: 2.5000e-04 Epoch 20/20 406/408 [============================>.] - ETA: 0s - loss: 5.2791 - accuracy: 0.0048 - top_k_categorical_accuracy: 0.0000e+00 Epoch 20: val_loss did not improve from 5.28134 408/408 [==============================] - 13s 32ms/step - loss: 5.2785 - accuracy: 0.0048 - top_k_categorical_accuracy: 0.0000e+00 - val_loss: 5.2818 - val_accuracy: 0.0018 - val_top_k_categorical_accuracy: 0.0166 - lr: 2.5000e-04 Restoring model weights from the end of the best epoch: 6.
Train Val Loss Graph
plot_training_history(history)
# Get all predictions and true labels
y_true = []
y_pred = []
for X_batch, y_batch in val_ds:
preds = googlenet_model_tuned.predict(X_batch)
y_pred_batch = np.argmax(preds, axis=1)
y_true_batch = y_batch.numpy() if hasattr(y_batch, "numpy") else y_batch
y_true.extend(y_true_batch)
y_pred.extend(y_pred_batch)
1/1 [==============================] - 7s 7s/step 1/1 [==============================] - 0s 28ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 29ms/step 1/1 [==============================] - 0s 30ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 28ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 28ms/step 1/1 [==============================] - 0s 164ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 28ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 30ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 28ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 28ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 30ms/step 1/1 [==============================] - 0s 31ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 31ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 28ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 28ms/step 1/1 [==============================] - 0s 28ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 29ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 30ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 26ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 28ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 28ms/step 1/1 [==============================] - 0s 28ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 28ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 30ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 28ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 30ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 27ms/step 1/1 [==============================] - 0s 28ms/step 1/1 [==============================] - 0s 27ms/step
target_names = label_encoder.classes_ if 'label_encoder' in globals() else None
report = classification_report(
y_true, y_pred,
target_names=target_names,
output_dict=True,
zero_division=1
)
df_googlenet_tuned_report = pd.DataFrame(report).transpose()
acc = accuracy_score(y_true, y_pred)
df_googlenet_tuned_report.loc["overall_accuracy"] = [acc, None, None, None]
df_googlenet_tuned_report.to_csv("googlenet_tuned_classification_report.csv")
print(f"Tuned GoogLeNet Accuracy: {acc:.4f}")
print("Average Summary Metrics:")
print(df_googlenet_tuned_report.tail(3)[["precision", "recall", "f1-score"]])
Tuned GoogLeNet Accuracy: 0.0031
Average Summary Metrics:
precision recall f1-score
macro avg 0.689119 0.002145 0.000543
weighted avg 0.676990 0.003069 0.000794
overall_accuracy 0.003069 NaN NaN
confusion matrix for tuned
cm = confusion_matrix(y_true, y_pred)
df_support = df_googlenet_tuned_report.iloc[:-3]
top_10_classes = df_support.sort_values("support", ascending=False).head(10).index.tolist()
# Get class indices (map from class name to index)
if target_names is not None:
top_10_indices = [np.where(target_names == cls)[0][0] for cls in top_10_classes]
else:
top_10_indices = list(map(int, top_10_classes)) # fallback if no class names
cm_top10 = cm[np.ix_(top_10_indices, top_10_indices)]
# Plot
plt.figure(figsize=(10, 8))
sns.heatmap(cm_top10, annot=True, fmt='d',
xticklabels=top_10_classes,
yticklabels=top_10_classes,
cmap='Blues')
plt.title("Tuned GoogLeNet - Confusion Matrix (Top 10 Classes)")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.tight_layout()
plt.show()
GoogleNet FineTuned Summary¶
- The untuned GoogLeNet model achieved an accuracy of 24.7%, outperforming the tuned model, which dropped to 0.31%.
- Although macro and weighted precision appear high, the recall and F1-scores are nearly zero, confirming that the model rarely makes correct predictions.
- This indicates that tuning did not improve performance and may have disrupted learning, likely due to preprocessing inconsistencies or label mapping mismatches or fine-tuning strategies
Fine Tuning Of ResNet Model¶
for Classimbalance
#encoding labels
label_encoder = LabelEncoder()
df_training['labels_encoded'] = label_encoder.fit_transform(df_training['labels'])
df_training['labels'] = df_training['labels'].astype(str)
df_val['labels'] = df_val['labels'].astype(str)
# Calculate class weights (important for class imbalance)
class_weights = class_weight.compute_class_weight(
'balanced',
classes=np.unique(df_training['labels_encoded']),
y=df_training['labels_encoded']
)
class_weights = dict(enumerate(class_weights)) # Convert to dictionary
batch_size = 32
image_size = (224, 224)
train_datagen = ImageDataGenerator(
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest'
#,preprocessing_function=resnet_preprocess
)
train_generator = train_datagen.flow_from_dataframe(
df_training,
x_col='Image_Path',
y_col='labels',
target_size=image_size,
batch_size=batch_size,
class_mode='categorical',
seed=42
)
Found 8144 validated image filenames belonging to 196 classes.
#val_datagen = ImageDataGenerator(preprocessing_function=resnet_preprocess)
val_datagen = ImageDataGenerator()
val_generator = val_datagen.flow_from_dataframe(
df_val,
x_col='Image_Path',
y_col='labels',
target_size=image_size,
batch_size=batch_size,
class_mode='categorical',
seed=42
)
Found 1629 validated image filenames belonging to 196 classes.
model definition
# Load ResNet50 base model without the top layer
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
# Unfreeze some layers of the base model for fine-tuning (last 40 layers)
for layer in base_model.layers:
layer.trainable = False
for layer in base_model.layers[-40:]:
layer.trainable = True
num_classes = df_train['labels'].nunique()
# Add custom classification layers
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(512, activation='relu')(x)
x = Dropout(0.5)(x)
x = Dense(num_classes, activation='softmax')(x) # Output layer
# Define model
resnet_tuned_model = Model(inputs=base_model.input, outputs=x)
# Compile the model again with a higher learning rate
resnet_tuned_model.compile(optimizer=Adam(learning_rate=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])
resnet_tuned_model.summary()
Model: "model_4"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_5 (InputLayer) [(None, 224, 224, 3)] 0 []
conv1_pad (ZeroPadding2D) (None, 230, 230, 3) 0 ['input_5[0][0]']
conv1_conv (Conv2D) (None, 112, 112, 64) 9472 ['conv1_pad[0][0]']
conv1_bn (BatchNormalizati (None, 112, 112, 64) 256 ['conv1_conv[0][0]']
on)
conv1_relu (Activation) (None, 112, 112, 64) 0 ['conv1_bn[0][0]']
pool1_pad (ZeroPadding2D) (None, 114, 114, 64) 0 ['conv1_relu[0][0]']
pool1_pool (MaxPooling2D) (None, 56, 56, 64) 0 ['pool1_pad[0][0]']
conv2_block1_1_conv (Conv2 (None, 56, 56, 64) 4160 ['pool1_pool[0][0]']
D)
conv2_block1_1_bn (BatchNo (None, 56, 56, 64) 256 ['conv2_block1_1_conv[0][0]']
rmalization)
conv2_block1_1_relu (Activ (None, 56, 56, 64) 0 ['conv2_block1_1_bn[0][0]']
ation)
conv2_block1_2_conv (Conv2 (None, 56, 56, 64) 36928 ['conv2_block1_1_relu[0][0]']
D)
conv2_block1_2_bn (BatchNo (None, 56, 56, 64) 256 ['conv2_block1_2_conv[0][0]']
rmalization)
conv2_block1_2_relu (Activ (None, 56, 56, 64) 0 ['conv2_block1_2_bn[0][0]']
ation)
conv2_block1_0_conv (Conv2 (None, 56, 56, 256) 16640 ['pool1_pool[0][0]']
D)
conv2_block1_3_conv (Conv2 (None, 56, 56, 256) 16640 ['conv2_block1_2_relu[0][0]']
D)
conv2_block1_0_bn (BatchNo (None, 56, 56, 256) 1024 ['conv2_block1_0_conv[0][0]']
rmalization)
conv2_block1_3_bn (BatchNo (None, 56, 56, 256) 1024 ['conv2_block1_3_conv[0][0]']
rmalization)
conv2_block1_add (Add) (None, 56, 56, 256) 0 ['conv2_block1_0_bn[0][0]',
'conv2_block1_3_bn[0][0]']
conv2_block1_out (Activati (None, 56, 56, 256) 0 ['conv2_block1_add[0][0]']
on)
conv2_block2_1_conv (Conv2 (None, 56, 56, 64) 16448 ['conv2_block1_out[0][0]']
D)
conv2_block2_1_bn (BatchNo (None, 56, 56, 64) 256 ['conv2_block2_1_conv[0][0]']
rmalization)
conv2_block2_1_relu (Activ (None, 56, 56, 64) 0 ['conv2_block2_1_bn[0][0]']
ation)
conv2_block2_2_conv (Conv2 (None, 56, 56, 64) 36928 ['conv2_block2_1_relu[0][0]']
D)
conv2_block2_2_bn (BatchNo (None, 56, 56, 64) 256 ['conv2_block2_2_conv[0][0]']
rmalization)
conv2_block2_2_relu (Activ (None, 56, 56, 64) 0 ['conv2_block2_2_bn[0][0]']
ation)
conv2_block2_3_conv (Conv2 (None, 56, 56, 256) 16640 ['conv2_block2_2_relu[0][0]']
D)
conv2_block2_3_bn (BatchNo (None, 56, 56, 256) 1024 ['conv2_block2_3_conv[0][0]']
rmalization)
conv2_block2_add (Add) (None, 56, 56, 256) 0 ['conv2_block1_out[0][0]',
'conv2_block2_3_bn[0][0]']
conv2_block2_out (Activati (None, 56, 56, 256) 0 ['conv2_block2_add[0][0]']
on)
conv2_block3_1_conv (Conv2 (None, 56, 56, 64) 16448 ['conv2_block2_out[0][0]']
D)
conv2_block3_1_bn (BatchNo (None, 56, 56, 64) 256 ['conv2_block3_1_conv[0][0]']
rmalization)
conv2_block3_1_relu (Activ (None, 56, 56, 64) 0 ['conv2_block3_1_bn[0][0]']
ation)
conv2_block3_2_conv (Conv2 (None, 56, 56, 64) 36928 ['conv2_block3_1_relu[0][0]']
D)
conv2_block3_2_bn (BatchNo (None, 56, 56, 64) 256 ['conv2_block3_2_conv[0][0]']
rmalization)
conv2_block3_2_relu (Activ (None, 56, 56, 64) 0 ['conv2_block3_2_bn[0][0]']
ation)
conv2_block3_3_conv (Conv2 (None, 56, 56, 256) 16640 ['conv2_block3_2_relu[0][0]']
D)
conv2_block3_3_bn (BatchNo (None, 56, 56, 256) 1024 ['conv2_block3_3_conv[0][0]']
rmalization)
conv2_block3_add (Add) (None, 56, 56, 256) 0 ['conv2_block2_out[0][0]',
'conv2_block3_3_bn[0][0]']
conv2_block3_out (Activati (None, 56, 56, 256) 0 ['conv2_block3_add[0][0]']
on)
conv3_block1_1_conv (Conv2 (None, 28, 28, 128) 32896 ['conv2_block3_out[0][0]']
D)
conv3_block1_1_bn (BatchNo (None, 28, 28, 128) 512 ['conv3_block1_1_conv[0][0]']
rmalization)
conv3_block1_1_relu (Activ (None, 28, 28, 128) 0 ['conv3_block1_1_bn[0][0]']
ation)
conv3_block1_2_conv (Conv2 (None, 28, 28, 128) 147584 ['conv3_block1_1_relu[0][0]']
D)
conv3_block1_2_bn (BatchNo (None, 28, 28, 128) 512 ['conv3_block1_2_conv[0][0]']
rmalization)
conv3_block1_2_relu (Activ (None, 28, 28, 128) 0 ['conv3_block1_2_bn[0][0]']
ation)
conv3_block1_0_conv (Conv2 (None, 28, 28, 512) 131584 ['conv2_block3_out[0][0]']
D)
conv3_block1_3_conv (Conv2 (None, 28, 28, 512) 66048 ['conv3_block1_2_relu[0][0]']
D)
conv3_block1_0_bn (BatchNo (None, 28, 28, 512) 2048 ['conv3_block1_0_conv[0][0]']
rmalization)
conv3_block1_3_bn (BatchNo (None, 28, 28, 512) 2048 ['conv3_block1_3_conv[0][0]']
rmalization)
conv3_block1_add (Add) (None, 28, 28, 512) 0 ['conv3_block1_0_bn[0][0]',
'conv3_block1_3_bn[0][0]']
conv3_block1_out (Activati (None, 28, 28, 512) 0 ['conv3_block1_add[0][0]']
on)
conv3_block2_1_conv (Conv2 (None, 28, 28, 128) 65664 ['conv3_block1_out[0][0]']
D)
conv3_block2_1_bn (BatchNo (None, 28, 28, 128) 512 ['conv3_block2_1_conv[0][0]']
rmalization)
conv3_block2_1_relu (Activ (None, 28, 28, 128) 0 ['conv3_block2_1_bn[0][0]']
ation)
conv3_block2_2_conv (Conv2 (None, 28, 28, 128) 147584 ['conv3_block2_1_relu[0][0]']
D)
conv3_block2_2_bn (BatchNo (None, 28, 28, 128) 512 ['conv3_block2_2_conv[0][0]']
rmalization)
conv3_block2_2_relu (Activ (None, 28, 28, 128) 0 ['conv3_block2_2_bn[0][0]']
ation)
conv3_block2_3_conv (Conv2 (None, 28, 28, 512) 66048 ['conv3_block2_2_relu[0][0]']
D)
conv3_block2_3_bn (BatchNo (None, 28, 28, 512) 2048 ['conv3_block2_3_conv[0][0]']
rmalization)
conv3_block2_add (Add) (None, 28, 28, 512) 0 ['conv3_block1_out[0][0]',
'conv3_block2_3_bn[0][0]']
conv3_block2_out (Activati (None, 28, 28, 512) 0 ['conv3_block2_add[0][0]']
on)
conv3_block3_1_conv (Conv2 (None, 28, 28, 128) 65664 ['conv3_block2_out[0][0]']
D)
conv3_block3_1_bn (BatchNo (None, 28, 28, 128) 512 ['conv3_block3_1_conv[0][0]']
rmalization)
conv3_block3_1_relu (Activ (None, 28, 28, 128) 0 ['conv3_block3_1_bn[0][0]']
ation)
conv3_block3_2_conv (Conv2 (None, 28, 28, 128) 147584 ['conv3_block3_1_relu[0][0]']
D)
conv3_block3_2_bn (BatchNo (None, 28, 28, 128) 512 ['conv3_block3_2_conv[0][0]']
rmalization)
conv3_block3_2_relu (Activ (None, 28, 28, 128) 0 ['conv3_block3_2_bn[0][0]']
ation)
conv3_block3_3_conv (Conv2 (None, 28, 28, 512) 66048 ['conv3_block3_2_relu[0][0]']
D)
conv3_block3_3_bn (BatchNo (None, 28, 28, 512) 2048 ['conv3_block3_3_conv[0][0]']
rmalization)
conv3_block3_add (Add) (None, 28, 28, 512) 0 ['conv3_block2_out[0][0]',
'conv3_block3_3_bn[0][0]']
conv3_block3_out (Activati (None, 28, 28, 512) 0 ['conv3_block3_add[0][0]']
on)
conv3_block4_1_conv (Conv2 (None, 28, 28, 128) 65664 ['conv3_block3_out[0][0]']
D)
conv3_block4_1_bn (BatchNo (None, 28, 28, 128) 512 ['conv3_block4_1_conv[0][0]']
rmalization)
conv3_block4_1_relu (Activ (None, 28, 28, 128) 0 ['conv3_block4_1_bn[0][0]']
ation)
conv3_block4_2_conv (Conv2 (None, 28, 28, 128) 147584 ['conv3_block4_1_relu[0][0]']
D)
conv3_block4_2_bn (BatchNo (None, 28, 28, 128) 512 ['conv3_block4_2_conv[0][0]']
rmalization)
conv3_block4_2_relu (Activ (None, 28, 28, 128) 0 ['conv3_block4_2_bn[0][0]']
ation)
conv3_block4_3_conv (Conv2 (None, 28, 28, 512) 66048 ['conv3_block4_2_relu[0][0]']
D)
conv3_block4_3_bn (BatchNo (None, 28, 28, 512) 2048 ['conv3_block4_3_conv[0][0]']
rmalization)
conv3_block4_add (Add) (None, 28, 28, 512) 0 ['conv3_block3_out[0][0]',
'conv3_block4_3_bn[0][0]']
conv3_block4_out (Activati (None, 28, 28, 512) 0 ['conv3_block4_add[0][0]']
on)
conv4_block1_1_conv (Conv2 (None, 14, 14, 256) 131328 ['conv3_block4_out[0][0]']
D)
conv4_block1_1_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block1_1_conv[0][0]']
rmalization)
conv4_block1_1_relu (Activ (None, 14, 14, 256) 0 ['conv4_block1_1_bn[0][0]']
ation)
conv4_block1_2_conv (Conv2 (None, 14, 14, 256) 590080 ['conv4_block1_1_relu[0][0]']
D)
conv4_block1_2_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block1_2_conv[0][0]']
rmalization)
conv4_block1_2_relu (Activ (None, 14, 14, 256) 0 ['conv4_block1_2_bn[0][0]']
ation)
conv4_block1_0_conv (Conv2 (None, 14, 14, 1024) 525312 ['conv3_block4_out[0][0]']
D)
conv4_block1_3_conv (Conv2 (None, 14, 14, 1024) 263168 ['conv4_block1_2_relu[0][0]']
D)
conv4_block1_0_bn (BatchNo (None, 14, 14, 1024) 4096 ['conv4_block1_0_conv[0][0]']
rmalization)
conv4_block1_3_bn (BatchNo (None, 14, 14, 1024) 4096 ['conv4_block1_3_conv[0][0]']
rmalization)
conv4_block1_add (Add) (None, 14, 14, 1024) 0 ['conv4_block1_0_bn[0][0]',
'conv4_block1_3_bn[0][0]']
conv4_block1_out (Activati (None, 14, 14, 1024) 0 ['conv4_block1_add[0][0]']
on)
conv4_block2_1_conv (Conv2 (None, 14, 14, 256) 262400 ['conv4_block1_out[0][0]']
D)
conv4_block2_1_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block2_1_conv[0][0]']
rmalization)
conv4_block2_1_relu (Activ (None, 14, 14, 256) 0 ['conv4_block2_1_bn[0][0]']
ation)
conv4_block2_2_conv (Conv2 (None, 14, 14, 256) 590080 ['conv4_block2_1_relu[0][0]']
D)
conv4_block2_2_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block2_2_conv[0][0]']
rmalization)
conv4_block2_2_relu (Activ (None, 14, 14, 256) 0 ['conv4_block2_2_bn[0][0]']
ation)
conv4_block2_3_conv (Conv2 (None, 14, 14, 1024) 263168 ['conv4_block2_2_relu[0][0]']
D)
conv4_block2_3_bn (BatchNo (None, 14, 14, 1024) 4096 ['conv4_block2_3_conv[0][0]']
rmalization)
conv4_block2_add (Add) (None, 14, 14, 1024) 0 ['conv4_block1_out[0][0]',
'conv4_block2_3_bn[0][0]']
conv4_block2_out (Activati (None, 14, 14, 1024) 0 ['conv4_block2_add[0][0]']
on)
conv4_block3_1_conv (Conv2 (None, 14, 14, 256) 262400 ['conv4_block2_out[0][0]']
D)
conv4_block3_1_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block3_1_conv[0][0]']
rmalization)
conv4_block3_1_relu (Activ (None, 14, 14, 256) 0 ['conv4_block3_1_bn[0][0]']
ation)
conv4_block3_2_conv (Conv2 (None, 14, 14, 256) 590080 ['conv4_block3_1_relu[0][0]']
D)
conv4_block3_2_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block3_2_conv[0][0]']
rmalization)
conv4_block3_2_relu (Activ (None, 14, 14, 256) 0 ['conv4_block3_2_bn[0][0]']
ation)
conv4_block3_3_conv (Conv2 (None, 14, 14, 1024) 263168 ['conv4_block3_2_relu[0][0]']
D)
conv4_block3_3_bn (BatchNo (None, 14, 14, 1024) 4096 ['conv4_block3_3_conv[0][0]']
rmalization)
conv4_block3_add (Add) (None, 14, 14, 1024) 0 ['conv4_block2_out[0][0]',
'conv4_block3_3_bn[0][0]']
conv4_block3_out (Activati (None, 14, 14, 1024) 0 ['conv4_block3_add[0][0]']
on)
conv4_block4_1_conv (Conv2 (None, 14, 14, 256) 262400 ['conv4_block3_out[0][0]']
D)
conv4_block4_1_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block4_1_conv[0][0]']
rmalization)
conv4_block4_1_relu (Activ (None, 14, 14, 256) 0 ['conv4_block4_1_bn[0][0]']
ation)
conv4_block4_2_conv (Conv2 (None, 14, 14, 256) 590080 ['conv4_block4_1_relu[0][0]']
D)
conv4_block4_2_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block4_2_conv[0][0]']
rmalization)
conv4_block4_2_relu (Activ (None, 14, 14, 256) 0 ['conv4_block4_2_bn[0][0]']
ation)
conv4_block4_3_conv (Conv2 (None, 14, 14, 1024) 263168 ['conv4_block4_2_relu[0][0]']
D)
conv4_block4_3_bn (BatchNo (None, 14, 14, 1024) 4096 ['conv4_block4_3_conv[0][0]']
rmalization)
conv4_block4_add (Add) (None, 14, 14, 1024) 0 ['conv4_block3_out[0][0]',
'conv4_block4_3_bn[0][0]']
conv4_block4_out (Activati (None, 14, 14, 1024) 0 ['conv4_block4_add[0][0]']
on)
conv4_block5_1_conv (Conv2 (None, 14, 14, 256) 262400 ['conv4_block4_out[0][0]']
D)
conv4_block5_1_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block5_1_conv[0][0]']
rmalization)
conv4_block5_1_relu (Activ (None, 14, 14, 256) 0 ['conv4_block5_1_bn[0][0]']
ation)
conv4_block5_2_conv (Conv2 (None, 14, 14, 256) 590080 ['conv4_block5_1_relu[0][0]']
D)
conv4_block5_2_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block5_2_conv[0][0]']
rmalization)
conv4_block5_2_relu (Activ (None, 14, 14, 256) 0 ['conv4_block5_2_bn[0][0]']
ation)
conv4_block5_3_conv (Conv2 (None, 14, 14, 1024) 263168 ['conv4_block5_2_relu[0][0]']
D)
conv4_block5_3_bn (BatchNo (None, 14, 14, 1024) 4096 ['conv4_block5_3_conv[0][0]']
rmalization)
conv4_block5_add (Add) (None, 14, 14, 1024) 0 ['conv4_block4_out[0][0]',
'conv4_block5_3_bn[0][0]']
conv4_block5_out (Activati (None, 14, 14, 1024) 0 ['conv4_block5_add[0][0]']
on)
conv4_block6_1_conv (Conv2 (None, 14, 14, 256) 262400 ['conv4_block5_out[0][0]']
D)
conv4_block6_1_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block6_1_conv[0][0]']
rmalization)
conv4_block6_1_relu (Activ (None, 14, 14, 256) 0 ['conv4_block6_1_bn[0][0]']
ation)
conv4_block6_2_conv (Conv2 (None, 14, 14, 256) 590080 ['conv4_block6_1_relu[0][0]']
D)
conv4_block6_2_bn (BatchNo (None, 14, 14, 256) 1024 ['conv4_block6_2_conv[0][0]']
rmalization)
conv4_block6_2_relu (Activ (None, 14, 14, 256) 0 ['conv4_block6_2_bn[0][0]']
ation)
conv4_block6_3_conv (Conv2 (None, 14, 14, 1024) 263168 ['conv4_block6_2_relu[0][0]']
D)
conv4_block6_3_bn (BatchNo (None, 14, 14, 1024) 4096 ['conv4_block6_3_conv[0][0]']
rmalization)
conv4_block6_add (Add) (None, 14, 14, 1024) 0 ['conv4_block5_out[0][0]',
'conv4_block6_3_bn[0][0]']
conv4_block6_out (Activati (None, 14, 14, 1024) 0 ['conv4_block6_add[0][0]']
on)
conv5_block1_1_conv (Conv2 (None, 7, 7, 512) 524800 ['conv4_block6_out[0][0]']
D)
conv5_block1_1_bn (BatchNo (None, 7, 7, 512) 2048 ['conv5_block1_1_conv[0][0]']
rmalization)
conv5_block1_1_relu (Activ (None, 7, 7, 512) 0 ['conv5_block1_1_bn[0][0]']
ation)
conv5_block1_2_conv (Conv2 (None, 7, 7, 512) 2359808 ['conv5_block1_1_relu[0][0]']
D)
conv5_block1_2_bn (BatchNo (None, 7, 7, 512) 2048 ['conv5_block1_2_conv[0][0]']
rmalization)
conv5_block1_2_relu (Activ (None, 7, 7, 512) 0 ['conv5_block1_2_bn[0][0]']
ation)
conv5_block1_0_conv (Conv2 (None, 7, 7, 2048) 2099200 ['conv4_block6_out[0][0]']
D)
conv5_block1_3_conv (Conv2 (None, 7, 7, 2048) 1050624 ['conv5_block1_2_relu[0][0]']
D)
conv5_block1_0_bn (BatchNo (None, 7, 7, 2048) 8192 ['conv5_block1_0_conv[0][0]']
rmalization)
conv5_block1_3_bn (BatchNo (None, 7, 7, 2048) 8192 ['conv5_block1_3_conv[0][0]']
rmalization)
conv5_block1_add (Add) (None, 7, 7, 2048) 0 ['conv5_block1_0_bn[0][0]',
'conv5_block1_3_bn[0][0]']
conv5_block1_out (Activati (None, 7, 7, 2048) 0 ['conv5_block1_add[0][0]']
on)
conv5_block2_1_conv (Conv2 (None, 7, 7, 512) 1049088 ['conv5_block1_out[0][0]']
D)
conv5_block2_1_bn (BatchNo (None, 7, 7, 512) 2048 ['conv5_block2_1_conv[0][0]']
rmalization)
conv5_block2_1_relu (Activ (None, 7, 7, 512) 0 ['conv5_block2_1_bn[0][0]']
ation)
conv5_block2_2_conv (Conv2 (None, 7, 7, 512) 2359808 ['conv5_block2_1_relu[0][0]']
D)
conv5_block2_2_bn (BatchNo (None, 7, 7, 512) 2048 ['conv5_block2_2_conv[0][0]']
rmalization)
conv5_block2_2_relu (Activ (None, 7, 7, 512) 0 ['conv5_block2_2_bn[0][0]']
ation)
conv5_block2_3_conv (Conv2 (None, 7, 7, 2048) 1050624 ['conv5_block2_2_relu[0][0]']
D)
conv5_block2_3_bn (BatchNo (None, 7, 7, 2048) 8192 ['conv5_block2_3_conv[0][0]']
rmalization)
conv5_block2_add (Add) (None, 7, 7, 2048) 0 ['conv5_block1_out[0][0]',
'conv5_block2_3_bn[0][0]']
conv5_block2_out (Activati (None, 7, 7, 2048) 0 ['conv5_block2_add[0][0]']
on)
conv5_block3_1_conv (Conv2 (None, 7, 7, 512) 1049088 ['conv5_block2_out[0][0]']
D)
conv5_block3_1_bn (BatchNo (None, 7, 7, 512) 2048 ['conv5_block3_1_conv[0][0]']
rmalization)
conv5_block3_1_relu (Activ (None, 7, 7, 512) 0 ['conv5_block3_1_bn[0][0]']
ation)
conv5_block3_2_conv (Conv2 (None, 7, 7, 512) 2359808 ['conv5_block3_1_relu[0][0]']
D)
conv5_block3_2_bn (BatchNo (None, 7, 7, 512) 2048 ['conv5_block3_2_conv[0][0]']
rmalization)
conv5_block3_2_relu (Activ (None, 7, 7, 512) 0 ['conv5_block3_2_bn[0][0]']
ation)
conv5_block3_3_conv (Conv2 (None, 7, 7, 2048) 1050624 ['conv5_block3_2_relu[0][0]']
D)
conv5_block3_3_bn (BatchNo (None, 7, 7, 2048) 8192 ['conv5_block3_3_conv[0][0]']
rmalization)
conv5_block3_add (Add) (None, 7, 7, 2048) 0 ['conv5_block2_out[0][0]',
'conv5_block3_3_bn[0][0]']
conv5_block3_out (Activati (None, 7, 7, 2048) 0 ['conv5_block3_add[0][0]']
on)
global_average_pooling2d_4 (None, 2048) 0 ['conv5_block3_out[0][0]']
(GlobalAveragePooling2D)
dense_11 (Dense) (None, 512) 1049088 ['global_average_pooling2d_4[0
][0]']
dropout_5 (Dropout) (None, 512) 0 ['dense_11[0][0]']
dense_12 (Dense) (None, 196) 100548 ['dropout_5[0][0]']
==================================================================================================
Total params: 24737348 (94.37 MB)
Trainable params: 16981444 (64.78 MB)
Non-trainable params: 7755904 (29.59 MB)
__________________________________________________________________________________________________
Data Generation
# Custom data generator
""" def custom_data_generator(generator, class_weights):
while True:
x, y = next(generator)
sample_weights = np.array([class_weights[np.argmax(label)] for label in y])
yield x, y, sample_weights """
def custom_data_generator(generator, class_weights):
while True:
x, y = next(generator)
sample_weights = np.array([class_weights[label] for label in np.argmax(y, axis=1)])
yield x, y, sample_weights
# Define data augmentation
#train_datagen = ImageDataGenerator(
# rotation_range=20,
# width_shift_range=0.2,
# height_shift_range=0.2,
# shear_range=0.2,
# zoom_range=0.2,
# horizontal_flip=True,
# fill_mode='nearest',
# preprocessing_function=preprocess_input # Use the ResNet50 preprocessing function
#)
# Load data using flow_from_dataframe
#print(df_train.columns) # Check the columns in the DataFrame
# Convert labels to string format if they are not already
#df_train['labels'] = df_train['labels'].astype(str)
# Create training data generator
#train_generator = train_datagen.flow_from_dataframe(
# df_train,
# x_col='Image_Path',
# y_col='labels', # Now it should be in the correct format
# target_size=(224, 224),
# batch_size=batch_size,
# class_mode='categorical'
#)
# Create validation data generator (assuming df_val is defined)
#val_datagen = ImageDataGenerator(preprocessing_function=preprocess_input) # No augmentation for validation
#val_generator = val_datagen.flow_from_dataframe(
# df_val,
# x_col='Image_Path',
# y_col='labels',
# target_size=(224, 224),
# batch_size=batch_size,
# class_mode='categorical'
#)
# Train the model for a few epochs
#steps_per_epoch = len(df_training) // batch_size
#validation_steps = len(df_val) // batch_size
steps_per_epoch = np.ceil(len(df_training) / batch_size).astype(int)
validation_steps = np.ceil(len(df_val) / batch_size).astype(int)
# Define number of fine-tuning epochs
fine_tune_epochs = 20
# Define callbacks
early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)
model_checkpoint = ModelCheckpoint('best_model_tuned_resnet.keras', save_best_only=True, monitor='val_loss')
# Train the model with callbacks and custom data generator
#resnet_history_fine_tune = resnet_model.fit(
# custom_data_generator(train_generator, class_weights),
# steps_per_epoch=steps_per_epoch,
# validation_data=val_generator,
# validation_steps=validation_steps,
# epochs=fine_tune_epochs,
# callbacks=[early_stopping, model_checkpoint]
#)
resnet_history_fine_tune = resnet_tuned_model.fit(
train_generator,
steps_per_epoch=steps_per_epoch,
validation_data=val_generator,
validation_steps=validation_steps,
epochs=20,
callbacks=[early_stopping, model_checkpoint]
,class_weight=class_weights
)
Epoch 1/20 255/255 [==============================] - 102s 341ms/step - loss: 5.5379 - accuracy: 0.0048 - val_loss: 5.2545 - val_accuracy: 0.0086 Epoch 2/20 255/255 [==============================] - 86s 337ms/step - loss: 5.3359 - accuracy: 0.0082 - val_loss: 5.1252 - val_accuracy: 0.0233 Epoch 3/20 255/255 [==============================] - 86s 337ms/step - loss: 5.2215 - accuracy: 0.0138 - val_loss: 5.0085 - val_accuracy: 0.0442 Epoch 4/20 255/255 [==============================] - 86s 336ms/step - loss: 5.1203 - accuracy: 0.0220 - val_loss: 4.8553 - val_accuracy: 0.0675 Epoch 5/20 255/255 [==============================] - 85s 335ms/step - loss: 5.0016 - accuracy: 0.0334 - val_loss: 4.6698 - val_accuracy: 0.0958 Epoch 6/20 255/255 [==============================] - 86s 337ms/step - loss: 4.8674 - accuracy: 0.0489 - val_loss: 4.4358 - val_accuracy: 0.1363 Epoch 7/20 255/255 [==============================] - 86s 336ms/step - loss: 4.7052 - accuracy: 0.0657 - val_loss: 4.1839 - val_accuracy: 0.1817 Epoch 8/20 255/255 [==============================] - 86s 337ms/step - loss: 4.5437 - accuracy: 0.0813 - val_loss: 3.9215 - val_accuracy: 0.2382 Epoch 9/20 255/255 [==============================] - 86s 336ms/step - loss: 4.3590 - accuracy: 0.1136 - val_loss: 3.6639 - val_accuracy: 0.2830 Epoch 10/20 255/255 [==============================] - 86s 337ms/step - loss: 4.1588 - accuracy: 0.1379 - val_loss: 3.4384 - val_accuracy: 0.3217 Epoch 11/20 255/255 [==============================] - 86s 337ms/step - loss: 3.9914 - accuracy: 0.1563 - val_loss: 3.1895 - val_accuracy: 0.3610 Epoch 12/20 255/255 [==============================] - 86s 337ms/step - loss: 3.8096 - accuracy: 0.1807 - val_loss: 2.9520 - val_accuracy: 0.4211 Epoch 13/20 255/255 [==============================] - 86s 337ms/step - loss: 3.6141 - accuracy: 0.2186 - val_loss: 2.7680 - val_accuracy: 0.4549 Epoch 14/20 255/255 [==============================] - 86s 335ms/step - loss: 3.4687 - accuracy: 0.2318 - val_loss: 2.5450 - val_accuracy: 0.4905 Epoch 15/20 255/255 [==============================] - 86s 339ms/step - loss: 3.3144 - accuracy: 0.2591 - val_loss: 2.3992 - val_accuracy: 0.5353 Epoch 16/20 255/255 [==============================] - 86s 336ms/step - loss: 3.1640 - accuracy: 0.2890 - val_loss: 2.2414 - val_accuracy: 0.5648 Epoch 17/20 255/255 [==============================] - 86s 337ms/step - loss: 3.0233 - accuracy: 0.3134 - val_loss: 2.0857 - val_accuracy: 0.5979 Epoch 18/20 255/255 [==============================] - 86s 337ms/step - loss: 2.8999 - accuracy: 0.3296 - val_loss: 1.9489 - val_accuracy: 0.6298 Epoch 19/20 255/255 [==============================] - 86s 337ms/step - loss: 2.7453 - accuracy: 0.3691 - val_loss: 1.7769 - val_accuracy: 0.6575 Epoch 20/20 255/255 [==============================] - 86s 336ms/step - loss: 2.6375 - accuracy: 0.3886 - val_loss: 1.6983 - val_accuracy: 0.6832
plot_training_history(resnet_history_fine_tune)
num_samples = len(df_val)
X_val = np.array(df_val['image'].tolist()).astype(np.float32) # Keep as list to avoid memory burst
y_val_true = np.array([np.argmax(label) for label in df_val['label_categorical']])
y_val_pred = []
for i in range(0, num_samples, batch_size):
batch_imgs = X_val[i:i+batch_size]
preds = resnet_tuned_model.predict(batch_imgs, verbose=0)
batch_preds = np.argmax(preds, axis=1)
y_val_pred.extend(batch_preds)
y_val_pred = np.array(y_val_pred)
print("X_val shape:", X_val.shape)
print("X_val dtype:", X_val.dtype)
X_val shape: (1629, 224, 224, 3) X_val dtype: float32
target_names = label_encoder.classes_ if 'label_encoder' in globals() else None
resnet_tuned_report = classification_report(
y_val_true, y_val_pred,
target_names=target_names,
output_dict=True,
zero_division=1 # Avoid divide-by-zero errors
)
#print("Unique y_true:", np.unique(y_val_true))
#print("Unique y_pred:", np.unique(y_val_pred))
#unique_preds, counts = np.unique(y_val_pred, return_counts=True)
#print("Predicted class distribution:", dict(zip(unique_preds, counts)))
df_resnet_tuned_report = pd.DataFrame(resnet_tuned_report).transpose()
acc = accuracy_score(y_val_true, y_val_pred)
df_resnet_tuned_report.loc["overall_accuracy"] = [acc, None, None, None]
df_resnet_tuned_report.to_csv("resnet_tuned_classification_report.csv")
print(f"Tuned ResNet Accuracy: {acc:.4f}")
print("Average Resnet Summary Metrics:")
print(df_resnet_tuned_report.tail(3)[["precision", "recall", "f1-score"]])
Tuned ResNet Accuracy: 0.0055
Average Resnet Summary Metrics:
precision recall f1-score
macro avg 0.989824 0.005102 0.000056
weighted avg 0.987753 0.005525 0.000061
overall_accuracy 0.005525 NaN NaN
cm = confusion_matrix(y_true, y_pred)
df_support = df_resnet_tuned_report.iloc[:-3]
top_10_classes = df_support.sort_values("support", ascending=False).head(10).index.tolist()
if target_names is not None:
top_10_indices = [np.where(target_names == cls)[0][0] for cls in top_10_classes]
else:
top_10_indices = list(map(int, top_10_classes)) # fallback if no class names
cm_top10 = cm[np.ix_(top_10_indices, top_10_indices)]
plt.figure(figsize=(10, 8))
sns.heatmap(cm_top10, annot=True, fmt='d',
xticklabels=top_10_classes,
yticklabels=top_10_classes,
cmap='Blues')
plt.title("Resnet Tuned - Confusion Matrix (Top 10 Classes)")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.tight_layout()
plt.show()
ResNet FineTuned Summary¶
- Despite tuning, the ResNet model's performance declined, with accuracy dropping from 0.74% (untuned) to 0.55% (tuned).
- Precision remains misleadingly high due to sparse predictions, while recall and F1-scores are nearly zero in both cases — indicating that the model fails to generalize across classes.
- Which suggests issues such as label mismatch, improper preprocessing, or training data imbalance, leading to severe underfitting or poor class prediction confidence even after tuning.
Model Decision¶
# First, add a column to identify each model
df_resnet_classification_report_tail = df_resnet_classification_report.tail(4).copy()
df_resnet_classification_report_tail['Model'] = 'ResNet Untuned (10 Epochs)'
df_resnet_tuned_report_tail = df_resnet_tuned_report.tail(4).copy()
df_resnet_tuned_report_tail['Model'] = 'ResNet Tuned (20 Epochs)'
df_googlenet_classification_report_tail = df_googlenet_classification_report.tail(4).copy()
df_googlenet_classification_report_tail['Model'] = 'GoogLeNet Untuned (10 Epochs)'
df_googlenet_tuned_report_tail = df_googlenet_tuned_report.tail(4).copy()
df_googlenet_tuned_report_tail['Model'] = 'GoogLeNet Tuned (20 Epochs)'
df_combined_tail = pd.concat([
df_resnet_classification_report_tail,
df_resnet_tuned_report_tail,
df_googlenet_classification_report_tail,
df_googlenet_tuned_report_tail
])
df_combined_tail = df_combined_tail.reset_index().rename(columns={'index': 'Metric'})
df_combined_tail = df_combined_tail[['Model', 'Metric', 'precision', 'recall', 'f1-score']]
df_combined_tail.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 16 entries, 0 to 15 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Model 16 non-null object 1 Metric 16 non-null object 2 precision 16 non-null float64 3 recall 12 non-null float64 4 f1-score 12 non-null float64 dtypes: float64(3), object(2) memory usage: 768.0+ bytes
df_combined_tail
| Model | Metric | precision | recall | f1-score | |
|---|---|---|---|---|---|
| 0 | ResNet Untuned (10 Epochs) | accuracy | 0.007366 | 0.007366 | 0.007366 |
| 1 | ResNet Untuned (10 Epochs) | macro avg | 0.964518 | 0.007240 | 0.000442 |
| 2 | ResNet Untuned (10 Epochs) | weighted avg | 0.968385 | 0.007366 | 0.000582 |
| 3 | ResNet Untuned (10 Epochs) | overall_accuracy | 0.007366 | NaN | NaN |
| 4 | ResNet Tuned (20 Epochs) | accuracy | 0.005525 | 0.005525 | 0.005525 |
| 5 | ResNet Tuned (20 Epochs) | macro avg | 0.989824 | 0.005102 | 0.000056 |
| 6 | ResNet Tuned (20 Epochs) | weighted avg | 0.987753 | 0.005525 | 0.000061 |
| 7 | ResNet Tuned (20 Epochs) | overall_accuracy | 0.005525 | NaN | NaN |
| 8 | GoogLeNet Untuned (10 Epochs) | accuracy | 0.247391 | 0.247391 | 0.247391 |
| 9 | GoogLeNet Untuned (10 Epochs) | macro avg | 0.354999 | 0.242632 | 0.223415 |
| 10 | GoogLeNet Untuned (10 Epochs) | weighted avg | 0.376060 | 0.247391 | 0.237403 |
| 11 | GoogLeNet Untuned (10 Epochs) | overall_accuracy | 0.247391 | NaN | NaN |
| 12 | GoogLeNet Tuned (20 Epochs) | accuracy | 0.003069 | 0.003069 | 0.003069 |
| 13 | GoogLeNet Tuned (20 Epochs) | macro avg | 0.689119 | 0.002145 | 0.000543 |
| 14 | GoogLeNet Tuned (20 Epochs) | weighted avg | 0.676990 | 0.003069 | 0.000794 |
| 15 | GoogLeNet Tuned (20 Epochs) | overall_accuracy | 0.003069 | NaN | NaN |
Model Selection¶
- Among all models tested, GoogLeNet (10 Epochs) showed the best performance, achieving an accuracy of 24.73% and a macro-average F1-score of 0.223415, significantly outperforming all other models.
- Despite additional training and tuning, both ResNet and GoogLeNet tuned models failed to generalize
hence the Final Model Selected for Test Evaluation:GoogLeNet Untuned Version with 10 Epohs
Untuned GoogleNet Model Validated Against Test Data Set¶
batch_size=16
# Step 1: Get true labels
y_test_true = np.array([np.argmax(label) for label in df_testing['label_categorical']])
# Step 2: Predict in batches
y_test_pred = []
for i in range(0, len(df_testing), batch_size):
batch_imgs = np.array(df_testing['image'].tolist()[i:i+batch_size])
preds = googlenet_model.predict(batch_imgs, verbose=0)
batch_preds = np.argmax(preds, axis=1)
y_test_pred.extend(batch_preds)
y_test_pred = np.array(y_test_pred)
target_names = label_encoder.classes_ if 'label_encoder' in globals() else None
final_googlenet_untuned_report = classification_report(
y_test_true, y_test_pred,
target_names=target_names,
output_dict=True,
zero_division=1 # Avoid divide-by-zero errors
)
df_final_googlenet_untuned_report = pd.DataFrame(final_googlenet_untuned_report).transpose()
acc = accuracy_score(y_test_true, y_test_pred)
df_final_googlenet_untuned_report.loc["overall_accuracy"]= [acc, None, None, None]
df_final_googlenet_untuned_report.to_csv("df_final_googlenet_untuned_classification_report.csv")
print(f"Final GoogleNet(Untuned) against test data Accuracy: {acc:.4f}")
print("Final Untuned GoogleNet metrics against test data set:")
print(df_final_googlenet_untuned_report.tail(3)[["precision", "recall", "f1-score"]])
Final GoogleNet(Untuned) against test data Accuracy: 0.2342
Final Untuned GoogleNet metrics against test data set:
precision recall f1-score
macro avg 0.319407 0.233567 0.223931
weighted avg 0.318643 0.234175 0.224003
overall_accuracy 0.234175 NaN NaN
cm = confusion_matrix(y_true, y_pred)
df_support = df_final_googlenet_untuned_report.iloc[:-3]
top_10_classes = df_support.sort_values("support", ascending=False).head(10).index.tolist()
if target_names is not None:
top_10_indices = [np.where(target_names == cls)[0][0] for cls in top_10_classes]
else:
top_10_indices = list(map(int, top_10_classes)) # fallback if no class names
cm_top10 = cm[np.ix_(top_10_indices, top_10_indices)]
plt.figure(figsize=(10, 8))
sns.heatmap(cm_top10, annot=True, fmt='d',
xticklabels=top_10_classes,
yticklabels=top_10_classes,
cmap='Blues')
plt.title("UnTuned GoogLeNet - Confusion Matrix (Top 10 Classes)-Against Test Data Set")
plt.xlabel("Predicted")
plt.ylabel("True")
plt.tight_layout()
plt.show()
Final Outcome (Test Data Evaluation)¶
- The untuned GoogLeNet model achieved a test accuracy of 23.42%, closely aligning with its validation performance (~24.73%).
- The macro and weighted average F1-scores (~0.224) indicate a reasonable baseline performance across 196 fine-grained car classes.
- This confirms the model has learned meaningful patterns and generalizes well, though this model requires further tuning/training/restructuring would be required
The GoogleNet is a pure classification model and does not support localization or masking of car regions in images¶
Since RCNN and its hybrids require a base model capable of region proposals or feature maps for bounding box regression and segmentation¶
As the current GoogLeNet model is not suitable for region-based masking required in RCNN workflows, we will move forward with implementing YOLO for object detection and localization.¶